zhao1iang commited on
Commit
825f345
1 Parent(s): 1b099da

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -3
README.md CHANGED
@@ -40,7 +40,6 @@ As of September 2024, Skywork-Critic-Llama3.1-70B **ranks first** on RewardBench
40
  | **Skywork-Critic-Llama3.1-70B** * | **96.6** | **87.9** | **93.1** | **95.5** | **93.3** |
41
  | Salesforce/SFR-LLaMa-3.1-70B-Judge-r | 96.9 | 84.8 | 91.6 | 97.6 | 92.7 |
42
  | Salesforce/SFR-nemo-12B-Judge-r | 97.2 | 82.2 | 86.5 | 95.1 | 90.3 |
43
- | **Skywork-Critic-Llama3.1-70B** # | **94.4** | **82.9** | **89.7** | **90.2** | **89.3** |
44
  | **Skywork-Critic-Llama3.1-8B** * | **93.6** | **81.4** | **91.1** | **89.8** | **89.0** |
45
  | Salesforce/SFR-LLaMa-3.1-8B-Judge-r | 95.5 | 77.7 | 86.2 | 95.1 | 88.7 |
46
  | facebook/Self-taught-Llama-3-70B | 96.9 | 84.0 | 91.1 | 82.5 | 88.6 |
@@ -52,8 +51,6 @@ As of September 2024, Skywork-Critic-Llama3.1-70B **ranks first** on RewardBench
52
  | meta-llama/Meta-Llama-3.1-70B-Instruct * | 97.2 | 70.2 | 82.8 | 86.0 | 84.0 |
53
  | NCSOFT/Llama-3-OffsetBias-8B * | 92.5 | 80.3 | 86.8 | 76.4 | 84.0 |
54
 
55
- For the Skywork-Critic-Llama3.1-70B model, we tested two types of prompts. The first simply asks the model to determine whether the response from model A or B is better, while the second prompt, using # to indicate this prompt, requires the model not only to choose the better response but also to provide specific reasoning. Surprisingly, the first approach yielded higher accuracy. Accurately generating critique explanations remains a challenge for the critic model and will be a key focus of our future research.
56
-
57
 
58
  # Demo Code
59
  Below is an example of obtaining the critic of two conversations.
 
40
  | **Skywork-Critic-Llama3.1-70B** * | **96.6** | **87.9** | **93.1** | **95.5** | **93.3** |
41
  | Salesforce/SFR-LLaMa-3.1-70B-Judge-r | 96.9 | 84.8 | 91.6 | 97.6 | 92.7 |
42
  | Salesforce/SFR-nemo-12B-Judge-r | 97.2 | 82.2 | 86.5 | 95.1 | 90.3 |
 
43
  | **Skywork-Critic-Llama3.1-8B** * | **93.6** | **81.4** | **91.1** | **89.8** | **89.0** |
44
  | Salesforce/SFR-LLaMa-3.1-8B-Judge-r | 95.5 | 77.7 | 86.2 | 95.1 | 88.7 |
45
  | facebook/Self-taught-Llama-3-70B | 96.9 | 84.0 | 91.1 | 82.5 | 88.6 |
 
51
  | meta-llama/Meta-Llama-3.1-70B-Instruct * | 97.2 | 70.2 | 82.8 | 86.0 | 84.0 |
52
  | NCSOFT/Llama-3-OffsetBias-8B * | 92.5 | 80.3 | 86.8 | 76.4 | 84.0 |
53
 
 
 
54
 
55
  # Demo Code
56
  Below is an example of obtaining the critic of two conversations.