Skywork
/

Skywork-Critic-Llama-3.1-8B

Text Generation

PyTorch

llama

conversational

Model card Files Files and versions Community

liang.zhao commited on Sep 13

Commit

b0a400b

•

1 Parent(s): 5e1a887

update model and config

Browse files

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -29,7 +29,7 @@ Skywork-Critic-Llama3.1-8B is built on Meta [Llama-3.1-8B-Instruct](https://hugg
     - [Pro (Llama-3.1)](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1)
     - [Pro](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1)
     - [Air](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1)
 Additionally, the model is trained on in-house human annotation data, synthetic data similar to the [**self-taught**](https://arxiv.org/abs/2408.02666) approach, and critic-related chat data. The training employs instruction-tuning methodology, focusing on pairwise preference evaluation and general chat tasks. We have conducted a thorough verification process to ensure our training dataset does not contain any test set information from RewardBench, maintaining the integrity of our evaluation results.
@@ -41,7 +41,7 @@ As of September 2024, Skywork-Critic-Llama3.1-8B ranks first on RewardBench for
 | Model                           | Chat  | Chat Hard | Safety | Reasoning | Overall Score |
 | ------------------------------- | :---: | :-------: | :----: | :-------: | :---: |
-| Skywork-Critic-Llama3.1-8B *      | **93.9**  |   **81.4**    |  **91.6**  |   **89.8**    | **89.1**  |
 | Salesforce/SFR-LLaMa-3.1-8B-Judge-r      | 95.5 | 77.7 | 86.2 | 95.1    | 88.7  |
 | facebook/Self-taught-Llama-3-70B  | 96.9  |   84.0    |  91.1  |   82.5    | 88.6  |
 | google/gemini-1.5-pro-0514      | 92.3  |   80.6    |  87.9  |   92.0    | 88.2  |
@@ -138,7 +138,7 @@ If you find our work helpful, please feel free to cite us using the following Bi
 ```bibtex
 @misc{skyworkcritic2024,
   title={Skywork Critic Model Series},
-  author={Shiwen, Tu and Liang, Zhao and Liu, Chris Yuhao and Zeng, Liang},
   year={2024},
   month={September},
   howpublished={\url{https://huggingface.co/Skywork}},

     - [Pro (Llama-3.1)](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1)
     - [Pro](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1)
     - [Air](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1)
+(We use a high-quality subset of this data collection. For more details, please refer to our [Skywork-Reward-Preference-80K-v0.1 dataset.](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.1))
 Additionally, the model is trained on in-house human annotation data, synthetic data similar to the [**self-taught**](https://arxiv.org/abs/2408.02666) approach, and critic-related chat data. The training employs instruction-tuning methodology, focusing on pairwise preference evaluation and general chat tasks. We have conducted a thorough verification process to ensure our training dataset does not contain any test set information from RewardBench, maintaining the integrity of our evaluation results.
 | Model                           | Chat  | Chat Hard | Safety | Reasoning | Overall Score |
 | ------------------------------- | :---: | :-------: | :----: | :-------: | :---: |
+| Skywork-Critic-Llama3.1-8B *      | **93.6**  |   **81.4**    |  **91.1**  |   **89.8**    | **89.0**  |
 | Salesforce/SFR-LLaMa-3.1-8B-Judge-r      | 95.5 | 77.7 | 86.2 | 95.1    | 88.7  |
 | facebook/Self-taught-Llama-3-70B  | 96.9  |   84.0    |  91.1  |   82.5    | 88.6  |
 | google/gemini-1.5-pro-0514      | 92.3  |   80.6    |  87.9  |   92.0    | 88.2  |
 ```bibtex
 @misc{skyworkcritic2024,
   title={Skywork Critic Model Series},
+  author={Shiwen, Tu and Liang, Zhao and Liu, Chris Yuhao and Zeng, Liang and Liu Yang},
   year={2024},
   month={September},
   howpublished={\url{https://huggingface.co/Skywork}},