Is this released checkpoint already finetuned by following the 3-steps outlined in the InstructGPT paper?
#2
by
Eamymao
- opened
The readme told us that this model is finetuned on webgpt and prompt_dialogue (version v2), but it doesn't explain the detail of finetuning. Therefore it is a bit confusing whether this model has been finetuned by RLHF steps in InstructGPT and what is the finetuning process. Does anyone know something about this?