Skywork
/

Skywork-Critic-Llama-3.1-8B

Text Generation

Model card Files Files and versions Community

zhao1iang commited on Sep 29

Commit

825f345

•

1 Parent(s): 1b099da

Update README.md

Files changed (1) hide show

README.md +0 -3

README.md CHANGED Viewed

@@ -40,7 +40,6 @@ As of September 2024, Skywork-Critic-Llama3.1-70B **ranks first** on RewardBench
 | **Skywork-Critic-Llama3.1-70B**  *      | **96.6**  |   **87.9**    |  **93.1**  |   **95.5**    | **93.3**  |
 | Salesforce/SFR-LLaMa-3.1-70B-Judge-r      | 96.9 | 84.8 | 91.6 | 97.6    | 92.7  |
 | Salesforce/SFR-nemo-12B-Judge-r      | 97.2 | 82.2 | 86.5 | 95.1    | 90.3  |
-| **Skywork-Critic-Llama3.1-70B**  #      | **94.4**  |   **82.9**    |  **89.7**  |   **90.2**    | **89.3**  |
 | **Skywork-Critic-Llama3.1-8B**  *      | **93.6**  |   **81.4**    |  **91.1**  |   **89.8**    | **89.0**  |
 | Salesforce/SFR-LLaMa-3.1-8B-Judge-r      | 95.5 | 77.7 | 86.2 | 95.1    | 88.7  |
 | facebook/Self-taught-Llama-3-70B  | 96.9  |   84.0    |  91.1  |   82.5    | 88.6  |
@@ -52,8 +51,6 @@ As of September 2024, Skywork-Critic-Llama3.1-70B **ranks first** on RewardBench
 | meta-llama/Meta-Llama-3.1-70B-Instruct *       | 97.2 |   70.2    |  82.8  |   86.0    | 84.0  |
 | NCSOFT/Llama-3-OffsetBias-8B *       | 92.5  |   80.3    |  86.8  |   76.4    | 84.0  |
-For the Skywork-Critic-Llama3.1-70B model, we tested two types of prompts. The first simply asks the model to determine whether the response from model A or B is better, while the second prompt, using # to indicate this prompt, requires the model not only to choose the better response but also to provide specific reasoning. Surprisingly, the first approach yielded higher accuracy. Accurately generating critique explanations remains a challenge for the critic model and will be a key focus of our future research.
 # Demo Code
 Below is an example of obtaining the critic of two conversations.

 | **Skywork-Critic-Llama3.1-70B**  *      | **96.6**  |   **87.9**    |  **93.1**  |   **95.5**    | **93.3**  |
 | Salesforce/SFR-LLaMa-3.1-70B-Judge-r      | 96.9 | 84.8 | 91.6 | 97.6    | 92.7  |
 | Salesforce/SFR-nemo-12B-Judge-r      | 97.2 | 82.2 | 86.5 | 95.1    | 90.3  |
 | **Skywork-Critic-Llama3.1-8B**  *      | **93.6**  |   **81.4**    |  **91.1**  |   **89.8**    | **89.0**  |
 | Salesforce/SFR-LLaMa-3.1-8B-Judge-r      | 95.5 | 77.7 | 86.2 | 95.1    | 88.7  |
 | facebook/Self-taught-Llama-3-70B  | 96.9  |   84.0    |  91.1  |   82.5    | 88.6  |
 | meta-llama/Meta-Llama-3.1-70B-Instruct *       | 97.2 |   70.2    |  82.8  |   86.0    | 84.0  |
 | NCSOFT/Llama-3-OffsetBias-8B *       | 92.5  |   80.3    |  86.8  |   76.4    | 84.0  |
 # Demo Code
 Below is an example of obtaining the critic of two conversations.