hamishivi commited on
Commit
5f49edf
·
verified ·
1 Parent(s): bf0c1c8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -20,6 +20,7 @@ Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in
20
  Check out the OLMo 2 paper (forthcoming) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
21
 
22
  This reward model was used to initialize value models during RLVR training for both 7B and 13B RLVR training.
 
23
 
24
  OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
25
  These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details.
 
20
  Check out the OLMo 2 paper (forthcoming) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
21
 
22
  This reward model was used to initialize value models during RLVR training for both 7B and 13B RLVR training.
23
+ Note we used a slightly different mix to the final mixture used for DPO training for this RM.
24
 
25
  OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
26
  These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details.