OpenRLHF
/

Llama-3-8b-rm-700k

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

chuyi777 commited on Jul 17

Commit

a45fd4a

•

1 Parent(s): 6c2b778

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -1,5 +1,7 @@
 The Llama3-8b-based Reward Model was trained using OpenRLHF and a combination of datasets available at https://huggingface.co/datasets/OpenLLMAI/preference_700K.
 ```
 Cosine Scheduler
 Learning Rate: 9e-6

 The Llama3-8b-based Reward Model was trained using OpenRLHF and a combination of datasets available at https://huggingface.co/datasets/OpenLLMAI/preference_700K.
+Base model: https://huggingface.co/OpenRLHF/Llama-3-8b-sft-mixture
 ```
 Cosine Scheduler
 Learning Rate: 9e-6