Skywork
/

Skywork-Critic-Llama-3.1-8B

Text Generation

Model card Files Files and versions Community

liang.zhao commited on Sep 12

Commit

5e1a887

•

1 Parent(s): 0ef247a

update model and config

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -30,7 +30,7 @@ Skywork-Critic-Llama3.1-8B is built on Meta [Llama-3.1-8B-Instruct](https://hugg
     - [Pro](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1)
     - [Air](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1)
-Additionally, the model was trained on in-house human annotation data, synthetic data similar to the [**self-taught**](https://arxiv.org/abs/2408.02666) approach, and critic-related chat data. The training employs instruction-tuning methodology, focusing on pairwise preference evaluation and general chat tasks. We have conducted a thorough verification process to ensure our training dataset does not contain any test set information from RewardBench, maintaining the integrity of our evaluation results.
 # RewardBench Leaderboard for Generative Models

     - [Pro](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1)
     - [Air](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1)
+Additionally, the model is trained on in-house human annotation data, synthetic data similar to the [**self-taught**](https://arxiv.org/abs/2408.02666) approach, and critic-related chat data. The training employs instruction-tuning methodology, focusing on pairwise preference evaluation and general chat tasks. We have conducted a thorough verification process to ensure our training dataset does not contain any test set information from RewardBench, maintaining the integrity of our evaluation results.
 # RewardBench Leaderboard for Generative Models