Update README.md
Browse files
README.md
CHANGED
@@ -40,7 +40,6 @@ As of September 2024, Skywork-Critic-Llama3.1-70B **ranks first** on RewardBench
|
|
40 |
| **Skywork-Critic-Llama3.1-70B** * | **96.6** | **87.9** | **93.1** | **95.5** | **93.3** |
|
41 |
| Salesforce/SFR-LLaMa-3.1-70B-Judge-r | 96.9 | 84.8 | 91.6 | 97.6 | 92.7 |
|
42 |
| Salesforce/SFR-nemo-12B-Judge-r | 97.2 | 82.2 | 86.5 | 95.1 | 90.3 |
|
43 |
-
| **Skywork-Critic-Llama3.1-70B** # | **94.4** | **82.9** | **89.7** | **90.2** | **89.3** |
|
44 |
| **Skywork-Critic-Llama3.1-8B** * | **93.6** | **81.4** | **91.1** | **89.8** | **89.0** |
|
45 |
| Salesforce/SFR-LLaMa-3.1-8B-Judge-r | 95.5 | 77.7 | 86.2 | 95.1 | 88.7 |
|
46 |
| facebook/Self-taught-Llama-3-70B | 96.9 | 84.0 | 91.1 | 82.5 | 88.6 |
|
@@ -52,8 +51,6 @@ As of September 2024, Skywork-Critic-Llama3.1-70B **ranks first** on RewardBench
|
|
52 |
| meta-llama/Meta-Llama-3.1-70B-Instruct * | 97.2 | 70.2 | 82.8 | 86.0 | 84.0 |
|
53 |
| NCSOFT/Llama-3-OffsetBias-8B * | 92.5 | 80.3 | 86.8 | 76.4 | 84.0 |
|
54 |
|
55 |
-
For the Skywork-Critic-Llama3.1-70B model, we tested two types of prompts. The first simply asks the model to determine whether the response from model A or B is better, while the second prompt, using # to indicate this prompt, requires the model not only to choose the better response but also to provide specific reasoning. Surprisingly, the first approach yielded higher accuracy. Accurately generating critique explanations remains a challenge for the critic model and will be a key focus of our future research.
|
56 |
-
|
57 |
|
58 |
# Demo Code
|
59 |
Below is an example of obtaining the critic of two conversations.
|
|
|
40 |
| **Skywork-Critic-Llama3.1-70B** * | **96.6** | **87.9** | **93.1** | **95.5** | **93.3** |
|
41 |
| Salesforce/SFR-LLaMa-3.1-70B-Judge-r | 96.9 | 84.8 | 91.6 | 97.6 | 92.7 |
|
42 |
| Salesforce/SFR-nemo-12B-Judge-r | 97.2 | 82.2 | 86.5 | 95.1 | 90.3 |
|
|
|
43 |
| **Skywork-Critic-Llama3.1-8B** * | **93.6** | **81.4** | **91.1** | **89.8** | **89.0** |
|
44 |
| Salesforce/SFR-LLaMa-3.1-8B-Judge-r | 95.5 | 77.7 | 86.2 | 95.1 | 88.7 |
|
45 |
| facebook/Self-taught-Llama-3-70B | 96.9 | 84.0 | 91.1 | 82.5 | 88.6 |
|
|
|
51 |
| meta-llama/Meta-Llama-3.1-70B-Instruct * | 97.2 | 70.2 | 82.8 | 86.0 | 84.0 |
|
52 |
| NCSOFT/Llama-3-OffsetBias-8B * | 92.5 | 80.3 | 86.8 | 76.4 | 84.0 |
|
53 |
|
|
|
|
|
54 |
|
55 |
# Demo Code
|
56 |
Below is an example of obtaining the critic of two conversations.
|