Lin-K76 commited on
Commit
b9e995e
·
verified ·
1 Parent(s): b589a15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -11
README.md CHANGED
@@ -130,16 +130,9 @@ oneshot(
130
 
131
  ## Evaluation
132
 
133
- The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) leaderboard tasks (version 1) with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the [vLLM](https://docs.vllm.ai/en/stable/) engine, using the following command.
134
- A modified version of ARC-C and GSM8k-cot was used for evaluations, in line with Llama 3.1's prompting. It can be accessed on the [Neural Magic fork of the lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct).
135
- Additional evaluations that were collected for the original Llama 3.1 models will be added in the future.
136
- ```
137
- lm_eval \
138
- --model vllm \
139
- --model_args pretrained="neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8-dynamic",dtype=auto,tensor_parallel_size=2,gpu_memory_utilization=0.8,add_bos_token=True,max_model_len=4096 \
140
- --tasks openllm \
141
- --batch_size auto
142
- ```
143
 
144
  ### Accuracy
145
 
@@ -225,4 +218,70 @@ lm_eval \
225
  <td><strong>99.76%</strong>
226
  </td>
227
  </tr>
228
- </table>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
130
 
131
  ## Evaluation
132
 
133
+ The model was evaluated on MMLU, ARC-Challenge, GSM-8K, Hellaswag, Winogrande and TruthfulQA.
134
+ Evaluation was conducted using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct) and the [vLLM](https://docs.vllm.ai/en/stable/) engine.
135
+ This version of the lm-evaluation-harness includes versions of ARC-Challenge and GSM-8K that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-8B-Instruct-evals).
 
 
 
 
 
 
 
136
 
137
  ### Accuracy
138
 
 
218
  <td><strong>99.76%</strong>
219
  </td>
220
  </tr>
221
+ </table>
222
+
223
+ ### Reproduction
224
+
225
+ The results were obtained using the following commands:
226
+
227
+ #### MMLU
228
+ ```
229
+ lm_eval \
230
+ --model vllm \
231
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8-dynamic",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=2 \
232
+ --tasks mmlu \
233
+ --num_fewshot 5 \
234
+ --batch_size auto
235
+ ```
236
+
237
+ #### ARC-Challenge
238
+ ```
239
+ lm_eval \
240
+ --model vllm \
241
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8-dynamic",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=2 \
242
+ --tasks arc_challenge_llama_3.1_instruct \
243
+ --apply_chat_template \
244
+ --num_fewshot 0 \
245
+ --batch_size auto
246
+ ```
247
+
248
+ #### GSM-8K
249
+ ```
250
+ lm_eval \
251
+ --model vllm \
252
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8-dynamic",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=2 \
253
+ --tasks gsm8k_cot_llama_3.1_instruct \
254
+ --apply_chat_template \
255
+ --num_fewshot 8 \
256
+ --batch_size auto
257
+ ```
258
+
259
+ #### Hellaswag
260
+ ```
261
+ lm_eval \
262
+ --model vllm \
263
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8-dynamic",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=2 \
264
+ --tasks hellaswag \
265
+ --num_fewshot 10 \
266
+ --batch_size auto
267
+ ```
268
+
269
+ #### Winogrande
270
+ ```
271
+ lm_eval \
272
+ --model vllm \
273
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8-dynamic",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=2 \
274
+ --tasks winogrande \
275
+ --num_fewshot 5 \
276
+ --batch_size auto
277
+ ```
278
+
279
+ #### TruthfulQA
280
+ ```
281
+ lm_eval \
282
+ --model vllm \
283
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8-dynamic",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=2 \
284
+ --tasks truthfulqa_mc \
285
+ --num_fewshot 0 \
286
+ --batch_size auto
287
+ ```