Update README.md
Browse files
README.md
CHANGED
@@ -64,13 +64,18 @@ def get_reward(prompt, completion):
|
|
64 |
|
65 |
## Training Details
|
66 |
|
67 |
-
The reward model was trained
|
68 |
-
-
|
69 |
-
-
|
70 |
-
- Learning rate
|
71 |
-
-
|
72 |
-
-
|
73 |
-
-
|
|
|
|
|
|
|
|
|
|
|
74 |
|
75 |
## Citation
|
76 |
|
|
|
64 |
|
65 |
## Training Details
|
66 |
|
67 |
+
The reward model was trained with the following settings:
|
68 |
+
- Base model: SmolTulu-1.7b-Instruct
|
69 |
+
- Mixed precision: bfloat16
|
70 |
+
- Learning rate: 4e-5
|
71 |
+
- Effective batch size: 4
|
72 |
+
- Maximum sequence length: 2048 tokens
|
73 |
+
- Maximum prompt length: 2048 tokens
|
74 |
+
- Training epochs: 1
|
75 |
+
- Training data: Tulu 3 8B preference mixture
|
76 |
+
- Evaluation data: UltraFeedback (cleaned)
|
77 |
+
- Gradient checkpointing enabled
|
78 |
+
- DeepSpeed Zero-3 for memory optimization
|
79 |
|
80 |
## Citation
|
81 |
|