Qwen
/

Qwen2-72B-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

yangapku commited on Jun 6, 2024

Commit

30765d9

·

verified ·

1 Parent(s): 32bc2d2

Update README.md

Files changed (1) hide show

README.md +27 -0

README.md CHANGED Viewed

@@ -124,6 +124,33 @@ For deployment, we recommend using vLLM. You can enable the long-context capabil
 **Note**: Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the `rope_scaling` configuration only when processing long contexts is required.
 ## Citation
 If you find our work helpful, feel free to give us a cite.

 **Note**: Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the `rope_scaling` configuration only when processing long contexts is required.
+## Evaluation
+We briefly compare Qwen2-72B-Instruct with similiar-sized instruction-tuned LLMs. The results are shown as follows:
+| Datasets | Llama-3-70B-Instruct | Qwen1.5-72B-Chat | **Qwen2-72B-Instruct** |
+| :--- | :---: | :---: | :---: |
+| _**English**_ |  |  |  |
+| MMLU | 82.0 | 75.6 | **82.3** |
+| MMLU-Pro | 56.2 | 51.7 | **64.4** |
+| GPQA | 41.9 | 39.4 | **42.4** |
+| TheroemQA | 42.5 | 28.8 | **44.4** |
+| MT-Bench | 8.95 | 8.61 | **9.12** |
+| Arena-Hard | 41.1 | 36.1 | **48.1** |
+| IFEval (Prompt Strict-Acc.) | 77.3 | 55.8 | **77.6** |
+| _**Coding**_ |  |  |  |
+| HumanEval | 81.7 | 71.3 | **86.0** |
+| MBPP | **82.3** | 71.9 | 80.2 |
+| MultiPL-E | 63.4 | 48.1 | **69.2** |
+| EvalPlus | 75.2 | 66.9 | **79.0** |
+| LiveCodeBench | 29.3 | 17.9 | **35.7** |
+| _**Mathematics**_ |  |  |  |
+| GSM8K | **93.0** | 82.7 | 91.1 |
+| MATH | 50.4 | 42.5 | **59.7** |
+| _**Chinese**_ |  |  |  |
+| C-Eval | 61.6 | 76.1 | **83.8** |
+| AlignBench | 7.42 | 7.28 | **8.27** |
 ## Citation
 If you find our work helpful, feel free to give us a cite.