feihu.hf commited on
Commit
abc8073
2 Parent(s): 7c7b419 afaaf53

Merge branch 'main' of hf.co:Qwen/Qwen2-72B-Instruct

Browse files
Files changed (1) hide show
  1. README.md +27 -0
README.md CHANGED
@@ -130,6 +130,33 @@ Or you can install vLLM from [source](https://github.com/vllm-project/vllm/).
130
 
131
  **Note**: Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the `rope_scaling` configuration only when processing long contexts is required.
132
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
133
  ## Citation
134
 
135
  If you find our work helpful, feel free to give us a cite.
 
130
 
131
  **Note**: Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the `rope_scaling` configuration only when processing long contexts is required.
132
 
133
+ ## Evaluation
134
+
135
+ We briefly compare Qwen2-72B-Instruct with similar-sized instruction-tuned LLMs, including our previous Qwen1.5-72B-Chat. The results are shown as follows:
136
+
137
+ | Datasets | Llama-3-70B-Instruct | Qwen1.5-72B-Chat | **Qwen2-72B-Instruct** |
138
+ | :--- | :---: | :---: | :---: |
139
+ | _**English**_ | | | |
140
+ | MMLU | 82.0 | 75.6 | **82.3** |
141
+ | MMLU-Pro | 56.2 | 51.7 | **64.4** |
142
+ | GPQA | 41.9 | 39.4 | **42.4** |
143
+ | TheroemQA | 42.5 | 28.8 | **44.4** |
144
+ | MT-Bench | 8.95 | 8.61 | **9.12** |
145
+ | Arena-Hard | 41.1 | 36.1 | **48.1** |
146
+ | IFEval (Prompt Strict-Acc.) | 77.3 | 55.8 | **77.6** |
147
+ | _**Coding**_ | | | |
148
+ | HumanEval | 81.7 | 71.3 | **86.0** |
149
+ | MBPP | **82.3** | 71.9 | 80.2 |
150
+ | MultiPL-E | 63.4 | 48.1 | **69.2** |
151
+ | EvalPlus | 75.2 | 66.9 | **79.0** |
152
+ | LiveCodeBench | 29.3 | 17.9 | **35.7** |
153
+ | _**Mathematics**_ | | | |
154
+ | GSM8K | **93.0** | 82.7 | 91.1 |
155
+ | MATH | 50.4 | 42.5 | **59.7** |
156
+ | _**Chinese**_ | | | |
157
+ | C-Eval | 61.6 | 76.1 | **83.8** |
158
+ | AlignBench | 7.42 | 7.28 | **8.27** |
159
+
160
  ## Citation
161
 
162
  If you find our work helpful, feel free to give us a cite.