liang.zhao commited on
Commit
b0a400b
1 Parent(s): 5e1a887

update model and config

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -29,7 +29,7 @@ Skywork-Critic-Llama3.1-8B is built on Meta [Llama-3.1-8B-Instruct](https://hugg
29
  - [Pro (Llama-3.1)](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1)
30
  - [Pro](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1)
31
  - [Air](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1)
32
-
33
  Additionally, the model is trained on in-house human annotation data, synthetic data similar to the [**self-taught**](https://arxiv.org/abs/2408.02666) approach, and critic-related chat data. The training employs instruction-tuning methodology, focusing on pairwise preference evaluation and general chat tasks. We have conducted a thorough verification process to ensure our training dataset does not contain any test set information from RewardBench, maintaining the integrity of our evaluation results.
34
 
35
 
@@ -41,7 +41,7 @@ As of September 2024, Skywork-Critic-Llama3.1-8B ranks first on RewardBench for
41
 
42
  | Model | Chat | Chat Hard | Safety | Reasoning | Overall Score |
43
  | ------------------------------- | :---: | :-------: | :----: | :-------: | :---: |
44
- | Skywork-Critic-Llama3.1-8B * | **93.9** | **81.4** | **91.6** | **89.8** | **89.1** |
45
  | Salesforce/SFR-LLaMa-3.1-8B-Judge-r | 95.5 | 77.7 | 86.2 | 95.1 | 88.7 |
46
  | facebook/Self-taught-Llama-3-70B | 96.9 | 84.0 | 91.1 | 82.5 | 88.6 |
47
  | google/gemini-1.5-pro-0514 | 92.3 | 80.6 | 87.9 | 92.0 | 88.2 |
@@ -138,7 +138,7 @@ If you find our work helpful, please feel free to cite us using the following Bi
138
  ```bibtex
139
  @misc{skyworkcritic2024,
140
  title={Skywork Critic Model Series},
141
- author={Shiwen, Tu and Liang, Zhao and Liu, Chris Yuhao and Zeng, Liang},
142
  year={2024},
143
  month={September},
144
  howpublished={\url{https://huggingface.co/Skywork}},
 
29
  - [Pro (Llama-3.1)](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1)
30
  - [Pro](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1)
31
  - [Air](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1)
32
+ (We use a high-quality subset of this data collection. For more details, please refer to our [Skywork-Reward-Preference-80K-v0.1 dataset.](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.1))
33
  Additionally, the model is trained on in-house human annotation data, synthetic data similar to the [**self-taught**](https://arxiv.org/abs/2408.02666) approach, and critic-related chat data. The training employs instruction-tuning methodology, focusing on pairwise preference evaluation and general chat tasks. We have conducted a thorough verification process to ensure our training dataset does not contain any test set information from RewardBench, maintaining the integrity of our evaluation results.
34
 
35
 
 
41
 
42
  | Model | Chat | Chat Hard | Safety | Reasoning | Overall Score |
43
  | ------------------------------- | :---: | :-------: | :----: | :-------: | :---: |
44
+ | Skywork-Critic-Llama3.1-8B * | **93.6** | **81.4** | **91.1** | **89.8** | **89.0** |
45
  | Salesforce/SFR-LLaMa-3.1-8B-Judge-r | 95.5 | 77.7 | 86.2 | 95.1 | 88.7 |
46
  | facebook/Self-taught-Llama-3-70B | 96.9 | 84.0 | 91.1 | 82.5 | 88.6 |
47
  | google/gemini-1.5-pro-0514 | 92.3 | 80.6 | 87.9 | 92.0 | 88.2 |
 
138
  ```bibtex
139
  @misc{skyworkcritic2024,
140
  title={Skywork Critic Model Series},
141
+ author={Shiwen, Tu and Liang, Zhao and Liu, Chris Yuhao and Zeng, Liang and Liu Yang},
142
  year={2024},
143
  month={September},
144
  howpublished={\url{https://huggingface.co/Skywork}},