weqweasdas
/

RM-Mistral-7B

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

weqweasdas commited on Mar 22

Commit

5519e53

•

1 Parent(s): f3a61bf

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -6,7 +6,7 @@
 <!-- Provide a quick summary of what the model is/does. -->
-The reward model is trained from the base model [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it).
 The training script is available at https://github.com/WeiXiongUST/RLHF-Reward-Modeling .
@@ -18,7 +18,7 @@ If you have any question with this reward model and also any question about rewa
 <!-- Provide a longer summary of what this model is. -->
-The model is trained on a mixture of [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it).
 - [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf)
 - [SHP](https://huggingface.co/datasets/stanfordnlp/SHP)

 <!-- Provide a quick summary of what the model is/does. -->
+The reward model is trained from the base model [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2).
 The training script is available at https://github.com/WeiXiongUST/RLHF-Reward-Modeling .
 <!-- Provide a longer summary of what this model is. -->
+The model is trained on a mixture of the dataset similar to [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it).
 - [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf)
 - [SHP](https://huggingface.co/datasets/stanfordnlp/SHP)