liang.zhao
commited on
Commit
•
5e1a887
1
Parent(s):
0ef247a
update model and config
Browse files
README.md
CHANGED
@@ -30,7 +30,7 @@ Skywork-Critic-Llama3.1-8B is built on Meta [Llama-3.1-8B-Instruct](https://hugg
|
|
30 |
- [Pro](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1)
|
31 |
- [Air](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1)
|
32 |
|
33 |
-
Additionally, the model
|
34 |
|
35 |
|
36 |
# RewardBench Leaderboard for Generative Models
|
|
|
30 |
- [Pro](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1)
|
31 |
- [Air](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1)
|
32 |
|
33 |
+
Additionally, the model is trained on in-house human annotation data, synthetic data similar to the [**self-taught**](https://arxiv.org/abs/2408.02666) approach, and critic-related chat data. The training employs instruction-tuning methodology, focusing on pairwise preference evaluation and general chat tasks. We have conducted a thorough verification process to ensure our training dataset does not contain any test set information from RewardBench, maintaining the integrity of our evaluation results.
|
34 |
|
35 |
|
36 |
# RewardBench Leaderboard for Generative Models
|