Update README.md
Browse files
README.md
CHANGED
@@ -20,6 +20,7 @@ Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in
|
|
20 |
Check out the OLMo 2 paper (forthcoming) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
|
21 |
|
22 |
This reward model was used to initialize value models during RLVR training for both 7B and 13B RLVR training.
|
|
|
23 |
|
24 |
OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
|
25 |
These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details.
|
|
|
20 |
Check out the OLMo 2 paper (forthcoming) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
|
21 |
|
22 |
This reward model was used to initialize value models during RLVR training for both 7B and 13B RLVR training.
|
23 |
+
Note we used a slightly different mix to the final mixture used for DPO training for this RM.
|
24 |
|
25 |
OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
|
26 |
These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details.
|