Commit
·
999e7d9
1
Parent(s):
6329d88
Update README.md
Browse files
README.md
CHANGED
@@ -25,7 +25,7 @@ The `RewardModel` is a [BERT](https://huggingface.co/bert-base-cased)model that
|
|
25 |
|
26 |
The model was trained with a dataset composed of `prompt`, `prefered_completions`, and `rejected_completions`.
|
27 |
|
28 |
-
|
29 |
|
30 |
## Details
|
31 |
|
@@ -122,7 +122,7 @@ This will output the following:
|
|
122 |
|---|---|
|
123 |
| [Aira-RewardModel](https://huggingface.co/nicholasKluge/RewardModel) | 96.54%* |
|
124 |
|
125 |
-
* Only considering comparisons of the `webgpt_comparisons` dataset that had a preferred option.
|
126 |
|
127 |
## License
|
128 |
|
|
|
25 |
|
26 |
The model was trained with a dataset composed of `prompt`, `prefered_completions`, and `rejected_completions`.
|
27 |
|
28 |
+
These prompt + completions are samples of intruction datasets created via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework.
|
29 |
|
30 |
## Details
|
31 |
|
|
|
122 |
|---|---|
|
123 |
| [Aira-RewardModel](https://huggingface.co/nicholasKluge/RewardModel) | 96.54%* |
|
124 |
|
125 |
+
* *Only considering comparisons of the `webgpt_comparisons` dataset that had a preferred option.
|
126 |
|
127 |
## License
|
128 |
|