weqweasdas
commited on
Commit
•
5519e53
1
Parent(s):
f3a61bf
Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@
|
|
6 |
|
7 |
<!-- Provide a quick summary of what the model is/does. -->
|
8 |
|
9 |
-
The reward model is trained from the base model [
|
10 |
|
11 |
The training script is available at https://github.com/WeiXiongUST/RLHF-Reward-Modeling .
|
12 |
|
@@ -18,7 +18,7 @@ If you have any question with this reward model and also any question about rewa
|
|
18 |
|
19 |
<!-- Provide a longer summary of what this model is. -->
|
20 |
|
21 |
-
The model is trained on a mixture of [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it).
|
22 |
|
23 |
- [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf)
|
24 |
- [SHP](https://huggingface.co/datasets/stanfordnlp/SHP)
|
|
|
6 |
|
7 |
<!-- Provide a quick summary of what the model is/does. -->
|
8 |
|
9 |
+
The reward model is trained from the base model [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2).
|
10 |
|
11 |
The training script is available at https://github.com/WeiXiongUST/RLHF-Reward-Modeling .
|
12 |
|
|
|
18 |
|
19 |
<!-- Provide a longer summary of what this model is. -->
|
20 |
|
21 |
+
The model is trained on a mixture of the dataset similar to [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it).
|
22 |
|
23 |
- [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf)
|
24 |
- [SHP](https://huggingface.co/datasets/stanfordnlp/SHP)
|