SultanR commited on
Commit
dd6950d
·
verified ·
1 Parent(s): 767fb6b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -7
README.md CHANGED
@@ -64,13 +64,18 @@ def get_reward(prompt, completion):
64
 
65
  ## Training Details
66
 
67
- The reward model was trained using:
68
- - Learning rate: 3 × 10⁻⁶
69
- - Gradient norm threshold: 1.0
70
- - Learning rate schedule: Linear
71
- - Batch size (effective): 256
72
- - Max token length: 2,048
73
- - Number of epochs: 1
 
 
 
 
 
74
 
75
  ## Citation
76
 
 
64
 
65
  ## Training Details
66
 
67
+ The reward model was trained with the following settings:
68
+ - Base model: SmolTulu-1.7b-Instruct
69
+ - Mixed precision: bfloat16
70
+ - Learning rate: 4e-5
71
+ - Effective batch size: 4
72
+ - Maximum sequence length: 2048 tokens
73
+ - Maximum prompt length: 2048 tokens
74
+ - Training epochs: 1
75
+ - Training data: Tulu 3 8B preference mixture
76
+ - Evaluation data: UltraFeedback (cleaned)
77
+ - Gradient checkpointing enabled
78
+ - DeepSpeed Zero-3 for memory optimization
79
 
80
  ## Citation
81