SultanR
/

SmolTulu-1.7b-RM

Text Classification

text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

SultanR commited on Dec 17, 2024

Commit

dd6950d

·

verified ·

1 Parent(s): 767fb6b

Update README.md

Files changed (1) hide show

README.md +12 -7

README.md CHANGED Viewed

@@ -64,13 +64,18 @@ def get_reward(prompt, completion):
 ## Training Details
-The reward model was trained using:
-- Learning rate: 3 × 10⁻⁶
-- Gradient norm threshold: 1.0
-- Learning rate schedule: Linear
-- Batch size (effective): 256
-- Max token length: 2,048
-- Number of epochs: 1
 ## Citation

 ## Training Details
+The reward model was trained with the following settings:
+- Base model: SmolTulu-1.7b-Instruct
+- Mixed precision: bfloat16
+- Learning rate: 4e-5
+- Effective batch size: 4
+- Maximum sequence length: 2048 tokens
+- Maximum prompt length: 2048 tokens
+- Training epochs: 1
+- Training data: Tulu 3 8B preference mixture
+- Evaluation data: UltraFeedback (cleaned)
+- Gradient checkpointing enabled
+- DeepSpeed Zero-3 for memory optimization
 ## Citation