allenai
/

OLMo-2-1124-7B-RM-Preview

Text Generation

text-classification

Inference Endpoints

Model card Files Files and versions Community

hamishivi commited on Nov 26, 2024

Commit

5f49edf

·

verified ·

1 Parent(s): bf0c1c8

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -20,6 +20,7 @@ Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in
 Check out the OLMo 2 paper (forthcoming) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
 This reward model was used to initialize value models during RLVR training for both 7B and 13B RLVR training.
 OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
 These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details.

 Check out the OLMo 2 paper (forthcoming) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
 This reward model was used to initialize value models during RLVR training for both 7B and 13B RLVR training.
+Note we used a slightly different mix to the final mixture used for DPO training for this RM.
 OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
 These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details.