liang.zhao commited on
Commit
1177f07
1 Parent(s): b0a400b

update model and config

Browse files
Files changed (1) hide show
  1. README.md +7 -12
README.md CHANGED
@@ -19,18 +19,13 @@ pipeline_tag: text-generation
19
  # Training Details
20
 
21
 
22
- Skywork-Critic-Llama3.1-8B is built on Meta [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) and fine-tuned on various datasets. These include open-source data such as:
 
 
 
 
23
 
24
- 1. [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2)
25
- 2. [OffsetBias](https://huggingface.co/datasets/NCSOFT/offsetbias)
26
- 3. [WildGuard (adversarial)](https://huggingface.co/allenai/wildguard)
27
- 4. Magpie DPO series:
28
- - [Ultra](https://huggingface.co/datasets/argilla/magpie-ultra-v0.1)
29
- - [Pro (Llama-3.1)](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1)
30
- - [Pro](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1)
31
- - [Air](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1)
32
- (We use a high-quality subset of this data collection. For more details, please refer to our [Skywork-Reward-Preference-80K-v0.1 dataset.](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.1))
33
- Additionally, the model is trained on in-house human annotation data, synthetic data similar to the [**self-taught**](https://arxiv.org/abs/2408.02666) approach, and critic-related chat data. The training employs instruction-tuning methodology, focusing on pairwise preference evaluation and general chat tasks. We have conducted a thorough verification process to ensure our training dataset does not contain any test set information from RewardBench, maintaining the integrity of our evaluation results.
34
 
35
 
36
  # RewardBench Leaderboard for Generative Models
@@ -138,7 +133,7 @@ If you find our work helpful, please feel free to cite us using the following Bi
138
  ```bibtex
139
  @misc{skyworkcritic2024,
140
  title={Skywork Critic Model Series},
141
- author={Shiwen, Tu and Liang, Zhao and Liu, Chris Yuhao and Zeng, Liang and Liu Yang},
142
  year={2024},
143
  month={September},
144
  howpublished={\url{https://huggingface.co/Skywork}},
 
19
  # Training Details
20
 
21
 
22
+ Skywork-Critic-Llama3.1-8B is built on Meta [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) and has undergone fine-tuning using a diverse array of high-quality datasets, including:
23
+ - **Cleaned open-source data**: We utilize a high-quality subset of [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2), [OffsetBias](https://huggingface.co/datasets/NCSOFT/offsetbias), [WildGuard (adversarial)](https://huggingface.co/allenai/wildguard) and Magpie DPO series([Ultra](https://huggingface.co/datasets/argilla/magpie-ultra-v0.1),[Pro (Llama-3.1)](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1),[Pro](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1),[Air](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1)). For more details, please refer to our [Skywork-Reward-Preference-80K-v0.1 dataset.](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.1).
24
+ - **In-house human annotation data**: This includes both pointwise scoring across many dimensions for a single response and pairwise comparisons between two responses. Each dimension incorporates a rationale for the assigned score.
25
+ - **Synthetic critic data**: We use a similar appoarch to [**self-taught**](https://arxiv.org/abs/2408.02666). Specifically, we employed two methods to generate inferior responses for a given instruction: 1) Creating a similar instruction and then generating a response for this new instruction. 2) Introducing subtle errors into high-quality responses.
26
+ - **Critic-related chat data**: We incorporate critic-related chat data to maintain the model's conversational capabilities.
27
 
28
+ The training employs instruction-tuning methodology, focusing on pairwise preference evaluation and general chat tasks. We have conducted a thorough verification process to ensure our training dataset does not contain any test set information from RewardBench, maintaining the integrity of our evaluation results.
 
 
 
 
 
 
 
 
 
29
 
30
 
31
  # RewardBench Leaderboard for Generative Models
 
133
  ```bibtex
134
  @misc{skyworkcritic2024,
135
  title={Skywork Critic Model Series},
136
+ author={Shiwen, Tu and Liang, Zhao and Liu, Chris Yuhao and Zeng, Liang and Liu, Yang},
137
  year={2024},
138
  month={September},
139
  howpublished={\url{https://huggingface.co/Skywork}},