weqweasdas
commited on
Commit
•
0656b31
1
Parent(s):
81b58a2
Update README.md
Browse files
README.md
CHANGED
@@ -10,6 +10,9 @@ The reward model is trained from the base model [mistralai/Mistral-7B-Instruct-v
|
|
10 |
|
11 |
The training script is available at https://github.com/WeiXiongUST/RLHF-Reward-Modeling .
|
12 |
|
|
|
|
|
|
|
13 |
## Model Details
|
14 |
|
15 |
If you have any question with this reward model and also any question about reward modeling, feel free to drop me an email with [email protected]. I would be happy to chat!
|
@@ -39,8 +42,6 @@ We train the model for one epoch with a learning rate of 5e-6, batch size 512, c
|
|
39 |
|
40 |
|
41 |
|
42 |
-
|
43 |
-
|
44 |
## Uses
|
45 |
|
46 |
```python
|
|
|
10 |
|
11 |
The training script is available at https://github.com/WeiXiongUST/RLHF-Reward-Modeling .
|
12 |
|
13 |
+
Also see a short blog for the training details (data mixture, parameters...): https://www.notion.so/Reward-Modeling-for-RLHF-abe03f9afdac42b9a5bee746844518d0
|
14 |
+
|
15 |
+
|
16 |
## Model Details
|
17 |
|
18 |
If you have any question with this reward model and also any question about reward modeling, feel free to drop me an email with [email protected]. I would be happy to chat!
|
|
|
42 |
|
43 |
|
44 |
|
|
|
|
|
45 |
## Uses
|
46 |
|
47 |
```python
|