Reward modeling
Collection
11 items
•
Updated
This model was trained from LLaMA 3.1 8B Instruct with dataset hendrydong/preference_700K
(Preprocessed dataset RyanYr/preference_700K_llama31_tokenized
). Training script is https://github.com/yurun-yuan/RLHF-Reward-Modeling/blob/4b827117dc9a85062c396eb62200b48e6dbfd596/bradley-terry-rm/llama3_rm.py
More information needed
More information needed
More information needed
The following hyperparameters were used during training: