Lin-K76 commited on
Commit
c18df4a
·
verified ·
1 Parent(s): 022a031

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -11
README.md CHANGED
@@ -162,16 +162,9 @@ oneshot(
162
 
163
  ## Evaluation
164
 
165
- The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) leaderboard tasks (version 1) with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the [vLLM](https://docs.vllm.ai/en/stable/) engine, using the following command.
166
- A modified version of ARC-C and GSM8k-cot was used for evaluations, in line with Llama 3.1's prompting. It can be accessed on the [Neural Magic fork of the lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct).
167
- Additional evaluations that were collected for the original Llama 3.1 models will be added in the future.
168
- ```
169
- lm_eval \
170
- --model vllm \
171
- --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8",dtype=auto,tensor_parallel_size=8,gpu_memory_utilization=0.755,add_bos_token=True,max_model_len=4096 \
172
- --tasks openllm \
173
- --batch_size auto
174
- ```
175
 
176
  ### Accuracy
177
 
@@ -257,4 +250,71 @@ lm_eval \
257
  <td><strong>99.74%</strong>
258
  </td>
259
  </tr>
260
- </table>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
162
 
163
  ## Evaluation
164
 
165
+ The model was evaluated on MMLU, ARC-Challenge, GSM-8K, Hellaswag, Winogrande and TruthfulQA.
166
+ Evaluation was conducted using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct) and the [vLLM](https://docs.vllm.ai/en/stable/) engine.
167
+ This version of the lm-evaluation-harness includes versions of ARC-Challenge and GSM-8K that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-8B-Instruct-evals).
 
 
 
 
 
 
 
168
 
169
  ### Accuracy
170
 
 
250
  <td><strong>99.74%</strong>
251
  </td>
252
  </tr>
253
+ </table>
254
+
255
+
256
+ ### Reproduction
257
+
258
+ The results were obtained using the following commands:
259
+
260
+ #### MMLU
261
+ ```
262
+ lm_eval \
263
+ --model vllm \
264
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=8 \
265
+ --tasks mmlu \
266
+ --num_fewshot 5 \
267
+ --batch_size auto
268
+ ```
269
+
270
+ #### ARC-Challenge
271
+ ```
272
+ lm_eval \
273
+ --model vllm \
274
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=8 \
275
+ --tasks arc_challenge_llama_3.1_instruct \
276
+ --apply_chat_template \
277
+ --num_fewshot 0 \
278
+ --batch_size auto
279
+ ```
280
+
281
+ #### GSM-8K
282
+ ```
283
+ lm_eval \
284
+ --model vllm \
285
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=8 \
286
+ --tasks gsm8k_cot_llama_3.1_instruct \
287
+ --apply_chat_template \
288
+ --num_fewshot 8 \
289
+ --batch_size auto
290
+ ```
291
+
292
+ #### Hellaswag
293
+ ```
294
+ lm_eval \
295
+ --model vllm \
296
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=8 \
297
+ --tasks hellaswag \
298
+ --num_fewshot 10 \
299
+ --batch_size auto
300
+ ```
301
+
302
+ #### Winogrande
303
+ ```
304
+ lm_eval \
305
+ --model vllm \
306
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=8 \
307
+ --tasks winogrande \
308
+ --num_fewshot 5 \
309
+ --batch_size auto
310
+ ```
311
+
312
+ #### Hellaswag
313
+ ```
314
+ lm_eval \
315
+ --model vllm \
316
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=8 \
317
+ --tasks truthfulqa_mc \
318
+ --num_fewshot 0 \
319
+ --batch_size auto
320
+ ```