Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -35,7 +35,8 @@ Difference between this mixture and that of
 ### Training
-We train the model for one epoch with a learning rate of 5e-6, batch size 512, cosine learning rate decay with a warmup ratio 0.03. You can see my training script here: https://github.com/WeiXiongUST/RAFT-Reward-Ranked-Finetuning/blob/main/reward_modeling.py , which is modified from the TRL package.


35
36	### Training
37
38	+ We train the model for one epoch with a learning rate of 5e-6, batch size 512, cosine learning rate decay with a warmup ratio 0.03.
39	+
40
41
42