weqweasdas commited on
Commit
0656b31
1 Parent(s): 81b58a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -10,6 +10,9 @@ The reward model is trained from the base model [mistralai/Mistral-7B-Instruct-v
10
 
11
  The training script is available at https://github.com/WeiXiongUST/RLHF-Reward-Modeling .
12
 
 
 
 
13
  ## Model Details
14
 
15
  If you have any question with this reward model and also any question about reward modeling, feel free to drop me an email with [email protected]. I would be happy to chat!
@@ -39,8 +42,6 @@ We train the model for one epoch with a learning rate of 5e-6, batch size 512, c
39
 
40
 
41
 
42
-
43
-
44
  ## Uses
45
 
46
  ```python
 
10
 
11
  The training script is available at https://github.com/WeiXiongUST/RLHF-Reward-Modeling .
12
 
13
+ Also see a short blog for the training details (data mixture, parameters...): https://www.notion.so/Reward-Modeling-for-RLHF-abe03f9afdac42b9a5bee746844518d0
14
+
15
+
16
  ## Model Details
17
 
18
  If you have any question with this reward model and also any question about reward modeling, feel free to drop me an email with [email protected]. I would be happy to chat!
 
42
 
43
 
44
 
 
 
45
  ## Uses
46
 
47
  ```python