Lin-K76 commited on
Commit
254f643
·
verified ·
1 Parent(s): 74e8382

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -22
README.md CHANGED
@@ -25,7 +25,8 @@ language:
25
  - **Model Developers:** Neural Magic
26
 
27
  Quantized version of [Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct).
28
- It achieves an average score of 77.75 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 78.67.
 
29
 
30
  ### Model Optimizations
31
 
@@ -170,6 +171,7 @@ lm_eval \
170
  --tasks openllm \
171
  --batch_size auto
172
  ```
 
173
 
174
  ### Accuracy
175
 
@@ -188,71 +190,71 @@ lm_eval \
188
  <tr>
189
  <td>MMLU (5-shot)
190
  </td>
191
- <td>82.21
192
  </td>
193
- <td>82.24
194
  </td>
195
- <td>100.0%
196
  </td>
197
  </tr>
198
  <tr>
199
  <td>ARC Challenge (25-shot)
200
  </td>
201
- <td>70.65
202
  </td>
203
- <td>69.03
204
  </td>
205
- <td>97.71%
206
  </td>
207
  </tr>
208
  <tr>
209
  <td>GSM-8K (5-shot, strict-match)
210
  </td>
211
- <td>87.95
212
  </td>
213
- <td>86.50
214
  </td>
215
- <td>98.35%
216
  </td>
217
  </tr>
218
  <tr>
219
  <td>Hellaswag (10-shot)
220
  </td>
221
- <td>86.33
222
  </td>
223
- <td>85.67
224
  </td>
225
- <td>99.24%
226
  </td>
227
  </tr>
228
  <tr>
229
  <td>Winogrande (5-shot)
230
  </td>
231
- <td>85.00
232
  </td>
233
- <td>85.79
234
  </td>
235
- <td>100.9%
236
  </td>
237
  </tr>
238
  <tr>
239
  <td>TruthfulQA (0-shot)
240
  </td>
241
- <td>59.90
242
  </td>
243
- <td>57.24
244
  </td>
245
- <td>95.56%
246
  </td>
247
  </tr>
248
  <tr>
249
  <td><strong>Average</strong>
250
  </td>
251
- <td><strong>78.67</strong>
252
  </td>
253
- <td><strong>77.75</strong>
254
  </td>
255
- <td><strong>98.82%</strong>
256
  </td>
257
  </tr>
258
  </table>
 
25
  - **Model Developers:** Neural Magic
26
 
27
  Quantized version of [Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct).
28
+ <!-- It achieves an average score of 77.75 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 78.67. -->
29
+ It achieves an average recovery of 99.44% on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1) compared to the unquantized model.
30
 
31
  ### Model Optimizations
32
 
 
171
  --tasks openllm \
172
  --batch_size auto
173
  ```
174
+ Certain benchmarks for the full precision model are still being acquired. Average recovery is calculated only with metrics that both models have been evaluated on.
175
 
176
  ### Accuracy
177
 
 
190
  <tr>
191
  <td>MMLU (5-shot)
192
  </td>
193
+ <td>*
194
  </td>
195
+ <td>86.06
196
  </td>
197
+ <td>*
198
  </td>
199
  </tr>
200
  <tr>
201
  <td>ARC Challenge (25-shot)
202
  </td>
203
+ <td>73.38
204
  </td>
205
+ <td>72.87
206
  </td>
207
+ <td>99.30%
208
  </td>
209
  </tr>
210
  <tr>
211
  <td>GSM-8K (5-shot, strict-match)
212
  </td>
213
+ <td>95.07
214
  </td>
215
+ <td>94.39
216
  </td>
217
+ <td>99.28%
218
  </td>
219
  </tr>
220
  <tr>
221
  <td>Hellaswag (10-shot)
222
  </td>
223
+ <td>*
224
  </td>
225
+ <td>*
226
  </td>
227
+ <td>*
228
  </td>
229
  </tr>
230
  <tr>
231
  <td>Winogrande (5-shot)
232
  </td>
233
+ <td>87.21
234
  </td>
235
+ <td>86.98
236
  </td>
237
+ <td>99.74%
238
  </td>
239
  </tr>
240
  <tr>
241
  <td>TruthfulQA (0-shot)
242
  </td>
243
+ <td>*
244
  </td>
245
+ <td>64.9
246
  </td>
247
+ <td>*
248
  </td>
249
  </tr>
250
  <tr>
251
  <td><strong>Average</strong>
252
  </td>
253
+ <td><strong>*</strong>
254
  </td>
255
+ <td><strong>*</strong>
256
  </td>
257
+ <td><strong>99.44%</strong>
258
  </td>
259
  </tr>
260
  </table>