liang.zhao
commited on
Commit
•
b0a400b
1
Parent(s):
5e1a887
update model and config
Browse files
README.md
CHANGED
@@ -29,7 +29,7 @@ Skywork-Critic-Llama3.1-8B is built on Meta [Llama-3.1-8B-Instruct](https://hugg
|
|
29 |
- [Pro (Llama-3.1)](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1)
|
30 |
- [Pro](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1)
|
31 |
- [Air](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1)
|
32 |
-
|
33 |
Additionally, the model is trained on in-house human annotation data, synthetic data similar to the [**self-taught**](https://arxiv.org/abs/2408.02666) approach, and critic-related chat data. The training employs instruction-tuning methodology, focusing on pairwise preference evaluation and general chat tasks. We have conducted a thorough verification process to ensure our training dataset does not contain any test set information from RewardBench, maintaining the integrity of our evaluation results.
|
34 |
|
35 |
|
@@ -41,7 +41,7 @@ As of September 2024, Skywork-Critic-Llama3.1-8B ranks first on RewardBench for
|
|
41 |
|
42 |
| Model | Chat | Chat Hard | Safety | Reasoning | Overall Score |
|
43 |
| ------------------------------- | :---: | :-------: | :----: | :-------: | :---: |
|
44 |
-
| Skywork-Critic-Llama3.1-8B * | **93.
|
45 |
| Salesforce/SFR-LLaMa-3.1-8B-Judge-r | 95.5 | 77.7 | 86.2 | 95.1 | 88.7 |
|
46 |
| facebook/Self-taught-Llama-3-70B | 96.9 | 84.0 | 91.1 | 82.5 | 88.6 |
|
47 |
| google/gemini-1.5-pro-0514 | 92.3 | 80.6 | 87.9 | 92.0 | 88.2 |
|
@@ -138,7 +138,7 @@ If you find our work helpful, please feel free to cite us using the following Bi
|
|
138 |
```bibtex
|
139 |
@misc{skyworkcritic2024,
|
140 |
title={Skywork Critic Model Series},
|
141 |
-
author={Shiwen, Tu and Liang, Zhao and Liu, Chris Yuhao and Zeng, Liang},
|
142 |
year={2024},
|
143 |
month={September},
|
144 |
howpublished={\url{https://huggingface.co/Skywork}},
|
|
|
29 |
- [Pro (Llama-3.1)](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1)
|
30 |
- [Pro](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1)
|
31 |
- [Air](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1)
|
32 |
+
(We use a high-quality subset of this data collection. For more details, please refer to our [Skywork-Reward-Preference-80K-v0.1 dataset.](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.1))
|
33 |
Additionally, the model is trained on in-house human annotation data, synthetic data similar to the [**self-taught**](https://arxiv.org/abs/2408.02666) approach, and critic-related chat data. The training employs instruction-tuning methodology, focusing on pairwise preference evaluation and general chat tasks. We have conducted a thorough verification process to ensure our training dataset does not contain any test set information from RewardBench, maintaining the integrity of our evaluation results.
|
34 |
|
35 |
|
|
|
41 |
|
42 |
| Model | Chat | Chat Hard | Safety | Reasoning | Overall Score |
|
43 |
| ------------------------------- | :---: | :-------: | :----: | :-------: | :---: |
|
44 |
+
| Skywork-Critic-Llama3.1-8B * | **93.6** | **81.4** | **91.1** | **89.8** | **89.0** |
|
45 |
| Salesforce/SFR-LLaMa-3.1-8B-Judge-r | 95.5 | 77.7 | 86.2 | 95.1 | 88.7 |
|
46 |
| facebook/Self-taught-Llama-3-70B | 96.9 | 84.0 | 91.1 | 82.5 | 88.6 |
|
47 |
| google/gemini-1.5-pro-0514 | 92.3 | 80.6 | 87.9 | 92.0 | 88.2 |
|
|
|
138 |
```bibtex
|
139 |
@misc{skyworkcritic2024,
|
140 |
title={Skywork Critic Model Series},
|
141 |
+
author={Shiwen, Tu and Liang, Zhao and Liu, Chris Yuhao and Zeng, Liang and Liu Yang},
|
142 |
year={2024},
|
143 |
month={September},
|
144 |
howpublished={\url{https://huggingface.co/Skywork}},
|