yangapku commited on
Commit
30765d9
1 Parent(s): 32bc2d2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -0
README.md CHANGED
@@ -124,6 +124,33 @@ For deployment, we recommend using vLLM. You can enable the long-context capabil
124
 
125
  **Note**: Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the `rope_scaling` configuration only when processing long contexts is required.
126
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
127
  ## Citation
128
 
129
  If you find our work helpful, feel free to give us a cite.
 
124
 
125
  **Note**: Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the `rope_scaling` configuration only when processing long contexts is required.
126
 
127
+ ## Evaluation
128
+
129
+ We briefly compare Qwen2-72B-Instruct with similiar-sized instruction-tuned LLMs. The results are shown as follows:
130
+
131
+ | Datasets | Llama-3-70B-Instruct | Qwen1.5-72B-Chat | **Qwen2-72B-Instruct** |
132
+ | :--- | :---: | :---: | :---: |
133
+ | _**English**_ | | | |
134
+ | MMLU | 82.0 | 75.6 | **82.3** |
135
+ | MMLU-Pro | 56.2 | 51.7 | **64.4** |
136
+ | GPQA | 41.9 | 39.4 | **42.4** |
137
+ | TheroemQA | 42.5 | 28.8 | **44.4** |
138
+ | MT-Bench | 8.95 | 8.61 | **9.12** |
139
+ | Arena-Hard | 41.1 | 36.1 | **48.1** |
140
+ | IFEval (Prompt Strict-Acc.) | 77.3 | 55.8 | **77.6** |
141
+ | _**Coding**_ | | | |
142
+ | HumanEval | 81.7 | 71.3 | **86.0** |
143
+ | MBPP | **82.3** | 71.9 | 80.2 |
144
+ | MultiPL-E | 63.4 | 48.1 | **69.2** |
145
+ | EvalPlus | 75.2 | 66.9 | **79.0** |
146
+ | LiveCodeBench | 29.3 | 17.9 | **35.7** |
147
+ | _**Mathematics**_ | | | |
148
+ | GSM8K | **93.0** | 82.7 | 91.1 |
149
+ | MATH | 50.4 | 42.5 | **59.7** |
150
+ | _**Chinese**_ | | | |
151
+ | C-Eval | 61.6 | 76.1 | **83.8** |
152
+ | AlignBench | 7.42 | 7.28 | **8.27** |
153
+
154
  ## Citation
155
 
156
  If you find our work helpful, feel free to give us a cite.