JoeyHeisenberg
commited on
Commit
•
9c4f598
1
Parent(s):
57c05a0
Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ BlueLM 是由 vivo AI 全球研究院自主研发的大规模预训练语言模
|
|
22 |
BlueLM is a large-scale open-source language model independently developed by the vivo AI Lab. This release includes 2K and 32K context length versions for both Base and Chat models.
|
23 |
|
24 |
- **High-quality Data**: BlueLM is trained on a high-quality data with 2.6 trillion tokens. Our train corpus contains Chinese, English, Japanese and Korean data.
|
25 |
-
- **Stronger Performance**: BlueLM-7B-Chat achieves
|
26 |
- **Longer Context**: We have extended the context length of both BlueLM-7B-Base-32K and BlueLM-7B-Chat-32K models from 2K to 32K. The models can support longer context understanding while maintaining the same basic capabilities.
|
27 |
- **Model License**: BlueLM weights are open for academic research and commercial use.
|
28 |
|
@@ -32,9 +32,26 @@ The release versions and hugging face download links are listed in the table bel
|
|
32 |
|
33 |
| | Base Model | Chat Model | 4bits Quantized Chat Model |
|
34 |
|:---:|:--------------------:|:--------------------:|:--------------------------:|
|
35 |
-
| 7B | [BlueLM-7B-Base](https://huggingface.co/vivo-ai/BlueLM-7B-Base) | [BlueLM-7B-Chat](https://huggingface.co/vivo-ai/BlueLM-7B-Chat) | [BlueLM-7B-Chat-4bits](https://huggingface.co/vivo-ai/BlueLM-7B-Chat-4bits) |
|
36 |
| 7B-32K | [BlueLM-7B-Base-32K](https://huggingface.co/vivo-ai/BlueLM-7B-Base-32K) | [BlueLM-7B-Chat-32K](https://huggingface.co/vivo-ai/BlueLM-7B-Chat-32K) | - |
|
37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
## 推理部署/Inference and Deployment
|
39 |
|
40 |
```python
|
|
|
22 |
BlueLM is a large-scale open-source language model independently developed by the vivo AI Lab. This release includes 2K and 32K context length versions for both Base and Chat models.
|
23 |
|
24 |
- **High-quality Data**: BlueLM is trained on a high-quality data with 2.6 trillion tokens. Our train corpus contains Chinese, English, Japanese and Korean data.
|
25 |
+
- **Stronger Performance**: BlueLM-7B-Chat achieves a strong competitive performance in C-Eval and CMMLU benchmarks of the same size.
|
26 |
- **Longer Context**: We have extended the context length of both BlueLM-7B-Base-32K and BlueLM-7B-Chat-32K models from 2K to 32K. The models can support longer context understanding while maintaining the same basic capabilities.
|
27 |
- **Model License**: BlueLM weights are open for academic research and commercial use.
|
28 |
|
|
|
32 |
|
33 |
| | Base Model | Chat Model | 4bits Quantized Chat Model |
|
34 |
|:---:|:--------------------:|:--------------------:|:--------------------------:|
|
35 |
+
| 7B-2k | [BlueLM-7B-Base](https://huggingface.co/vivo-ai/BlueLM-7B-Base) | [BlueLM-7B-Chat](https://huggingface.co/vivo-ai/BlueLM-7B-Chat) | [BlueLM-7B-Chat-4bits](https://huggingface.co/vivo-ai/BlueLM-7B-Chat-4bits) |
|
36 |
| 7B-32K | [BlueLM-7B-Base-32K](https://huggingface.co/vivo-ai/BlueLM-7B-Base-32K) | [BlueLM-7B-Chat-32K](https://huggingface.co/vivo-ai/BlueLM-7B-Chat-32K) | - |
|
37 |
|
38 |
+
## 评测结果/Benchmark Results
|
39 |
+
|
40 |
+
为了保证模型评测的一致性,我们采用 [opencompass](https://opencompass.org.cn/leaderboard-llm) 进行相关榜单的评测。我们分别在 C-Eval、MMLU、CMMLU、GaoKao、AGIEval、BBH、GSM8K、MATH 和 HumanEval 榜单对 BlueLM 的通用能力、数学能力和代码能力进行了测试。
|
41 |
+
|
42 |
+
To ensure the consistency of model evaluation, we use [OpenCompass](https://opencompass.org.cn/leaderboard-llm) to evaluate the performance on relevant leaderboards. We conducted extensive tests on C-Eval, MMLU, CMMLU, GaoKao, AGIEval, BBH, GSM8K, MATH and HumanEval datasets across general ability, mathematical ability and coding ability.
|
43 |
+
|
44 |
+
| Model | **C-Eval** | **MMLU** | **CMMLU** | **Gaokao** | **AGIEval** | **BBH** | **GSM8K** | **MATH** | **HumanEval** |
|
45 |
+
|:------------------|:-----------|:---------|:----------|:-----------|:------------|:--------|:----------|:---------|:--------------|
|
46 |
+
| | 5-shot | 5-shot | 5-shot | 0-shot | 0-shot | 3-shot | 4-shot | 5-shot | 0-shot |
|
47 |
+
| GPT-4 | 69.9 | 86.4 | 71.2 | 72.3 | 55.1 | 86.7 | 91.4 | 45.8 | 74.4 |
|
48 |
+
| ChatGPT | 52.5 | 70.0 | 53.9 | 51.1 | 39.9 | 70.1 | 78.2 | 28 | 73.2 |
|
49 |
+
| LLaMA2-7B | 32.5 | 45.3 | 31.8 | 18.9 | 21.8 | 38.2 | 16.7 | 3.3 | 12.8 |
|
50 |
+
| ChatGLM2-6B(Base) | 51.7 | 47.9 | 50.0 | - | - | 33.7 | 32.4 | 6.5 | - |
|
51 |
+
| Baichuan2-7B | 56.3 | 54.7 | 57.0 | 34.8 | 34.6 | 41.8 | 24.6 | 5.4 | 17.7 |
|
52 |
+
| BlueLM-7B-Base | 67.5 | 55.2 | 66.6 | 58.9 | 43.4 | 41.7 | 27.2 | 6.2 | 18.3 |
|
53 |
+
| BlueLM-7B-Chat | 72.7 | 50.7 | 74.2 | 48.7 | 43.4 | 65.6 | 51.9 | 13.4 | 21.3 |
|
54 |
+
|
55 |
## 推理部署/Inference and Deployment
|
56 |
|
57 |
```python
|