SFT Models
Collection
We train a series of SFT models on the high-quality SFT dataset of RLHFlow for research purpose.
•
6 items
•
Updated
•
1
This is the SFT checkpoint used for the project RLHFlow/Online-RLHF
The model is trained from meta-llama/Meta-Llama-3-8B on RLHFlow/RLHFlow-SFT-Dataset-ver2 for 1 epoch. We use a global batch size of 128 and a learning rate of 2e-5, where we pack the samples and split them into chunks of 8192 token. See more training details at https://github.com/RLHFlow/Online-RLHF/blob/main/sft/llama3-8b-it.yaml .
Please cite our techical report if you find our model is useful for your research or product.
@misc{dong2024rlhf,
title={RLHF Workflow: From Reward Modeling to Online RLHF},
author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang},
year={2024},
eprint={2405.07863},
archivePrefix={arXiv},
primaryClass={cs.LG}
}