facat commited on
Commit
1451c6b
·
1 Parent(s): 47aff61

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -33
README.md CHANGED
@@ -11,7 +11,13 @@ widget:
11
  pipeline_tag: text-generation
12
  ---
13
 
14
- # 🐗SUS-Chat: Instruction tuning done right
 
 
 
 
 
 
15
 
16
  <div align="center">
17
 
@@ -53,7 +59,7 @@ pipeline_tag: text-generation
53
 
54
  <div style="display: inline-block;">
55
 
56
- <a rel="noopener nofollow" href="https://github.com/SUSTech-IDEA/SUS-Chat/blob/main/MODEL_LICENSE_AGREEMENT.txt">
57
  <img src="https://img.shields.io/badge/Model_License-Model_Agreement-lightblue" style="margin: 0 0;">
58
  </a>
59
 
@@ -69,29 +75,63 @@ pipeline_tag: text-generation
69
 
70
  </div>
71
 
72
- # Inrtoduction
73
-
74
- <img src="https://hackmd.io/_uploads/S1dXCTIHp.png" id="fig-sus"
75
- alt="Figure 1: DALL·E 2023-12-01 11.03.28 - An imposing, majestic wild boar combined with elements of a futuristic transformer robot. The boar itself should be intricately blended with these tra" />
76
 
77
- **SUS-Chat**
78
- 是一个34B的中英文对话模型,由南方科技大学和粤港澳大湾区数字经济研究院联合发布。SUS-Chat-34B模型在数百万高质、多语言的指令数据上进行了微调,在保持基础模型强大的语言能力的同时,SUS-Chat-34B模型通过高质量指令微调改善了模型对人类指令的响应方式并擅长通过思维链的方式模仿人类思考过程。
79
 
80
- 它在几乎所有基准测试中超过了所有同尺寸的模型,而且能够更好地满足了复杂多语言任务的实际需求,相比于更大的模型,SUS-Chat-34B仍具有相当竞争力,在我们的综合评测中取得了最先进的表现。
81
 
82
- SUS-Chat有力地证明了通过正确的指令微调,学术机构可以在不增加模型参数的情况下,通过开源的数据集和模型,获得更好的性能,
83
- 这弥合了学术界和工业界的在大语言模型上的差距,为学术界和工业界的合作提供了新的可能性。
84
 
85
- # Performance
 
86
 
87
- 为了更好地评估SUS-Chat-34B模型的性能,我们在多个基准测试中进行了评估,并开源了评估框架[TLEM](https://huggingface.co/spaces/SUSTech/tlem),以便于其他研究人员进行复现和比较。
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
 
89
- 在TLEM中,我们使用了多个基准测试,包括:MMLU, CMMLU, C-Eval, BBH,
90
- GSM-8K, MATH,
91
- 专注于衡量模型的知识和思维能力,在这些指标中SUS-Chat-34B模型取得了最先进的表现,我们还额外引入了[lm-eval](https://github.com/EleutherAI/lm-evaluation-harness)测试了SUS-Chat和同类模型在winogrande,
92
- hellaswag, arc, truthful-qa的表现, 衡量模型的常识性推理能力和幻觉。
93
 
94
- 综合上看,SUS-Chat-34B模型显著领先于同规模的模型,并取得了最先进的综合性能。
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
 
96
  | model | mmlu-chat | cmmlu-chat | ceval-chat | gsm8k | BBH | MATH | winogrande | arc | hellaswag | truthfulqa | average |
97
  |:------------------|----------:|-----------:|-----------:|------:|------:|------:|-----------:|------:|----------:|-----------:|--------:|
@@ -104,9 +144,11 @@ hellaswag, arc, truthful-qa的表现, 衡量模型的常识性推理能力和幻
104
 
105
  <img src="assets/radar.png" id="fig-bench" alt="Figure 2: Benchmark" />
106
 
107
- # 用法
108
 
109
- SUS-Chat-34B是标准的LLaMA模型,使用方法和开发环境与大多数其它开源模型相同,可以通过以下方式进行多轮对话
 
 
110
 
111
  ``` python
112
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -153,16 +195,31 @@ response = tokenizer.decode(
153
  messages.append({"role": "assistant", "content": response})
154
  ```
155
 
156
- # 限制
157
-
158
- SUS-Chat只进行了监督微调,尚未进行人类偏好学习,因此在一些情况下可能会产生不合理的回复,并放大某些语言模型现有的问题,
159
- 包括幻觉、非确定性和累积误差,
160
- 为了实现更有利于下游任务的性能,我们建议相应地调整生成是配置参数。
161
-
162
- # 免责声明
163
-
164
- 我们在训练过程中使用数据合规检查算法,尽力确保训练模型的合规性。由于数据复杂且语言模型使用场景多样,我们无法保证模型在所有情况下生成正确和合理的输出。请注意,模型仍然存在产生问题输出的风险。对于因滥用、误导、非法使用和相关错误信息以及相关数据安全问题而导致的任何风险和问题,我们将不承担责任。
165
-
166
- # 许可
167
-
168
- 该模型完全开发供学术研究和免费商业使用,但需要遵守来自零一万物的[许可](https://github.com/SUSTech-IDEA/SUS-Chat/blob/main/MODEL_LICENSE_AGREEMENT.txt)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  pipeline_tag: text-generation
12
  ---
13
 
14
+ # 🐷SUS-Chat: Instruction tuning done right
15
+
16
+ <p align="left">
17
+ <a href="README_CN.md">中文</a>&nbsp | &nbspEnglish&nbsp
18
+ </p>
19
+
20
+ <br><br>
21
 
22
  <div align="center">
23
 
 
59
 
60
  <div style="display: inline-block;">
61
 
62
+ <a rel="noopener nofollow" href="https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt">
63
  <img src="https://img.shields.io/badge/Model_License-Model_Agreement-lightblue" style="margin: 0 0;">
64
  </a>
65
 
 
75
 
76
  </div>
77
 
78
+ # News
 
 
 
79
 
80
+ - 2023-12-05: SUS-Chat is ranked 2nd in [Open LLM
81
+ leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
82
 
83
+ - 2023-12-01: SUS-Chat-34B is now avaliable on HuggingFace🤗.
84
 
85
+ # Inrtoduction
 
86
 
87
+ <img src="https://hackmd.io/_uploads/HJlDtzhBa.png" id="fig-sus"
88
+ alt="Figure 1: DALL·E 2023-12-01 11.03.28 - An imposing, majestic wild boar combined with elements of a futuristic transformer robot. The boar itself should be intricately blended with these tra" />
89
 
90
+ **SUS-Chat** is a 34B bilingual Chinese-English dialogue model, jointly
91
+ released by the **Southern University of Science and Technology** and
92
+ **International Digital Economy Academy**. The SUS-Chat-34B model has
93
+ been fine-tuned on millions of high-quality, multilingual instruction
94
+ data. While maintaining the strong language capabilities of the base
95
+ model, the SUS-Chat-34B model has improved the model’s response to human
96
+ instructions through high-quality instruction fine-tuning and excels at
97
+ imitating human thought processes through chains of thought. It
98
+ introduces inter-instruction attention sharing in long texts, expanding
99
+ the window length from 4K to 8K, significantly enhancing the usability
100
+ of multi-round dialogues.
101
+
102
+ It has surpassed all models of the same size in almost all benchmark
103
+ tests and is better suited to meet the practical needs of complex
104
+ multilingual tasks. Compared to larger models, SUS-Chat-34B remains
105
+ highly competitive and achieved state-of-the-art performance in our
106
+ comprehensive evaluations.
107
+
108
+ SUS-Chat powerfully demonstrates that through the right instruction
109
+ fine-tuning, academic institutions can achieve better performance
110
+ without increasing model parameters, using open-source datasets and
111
+ models. This bridges the gap between academia and industry in large
112
+ language models and opens new possibilities for collaboration between
113
+ academic and industrial sectors.
114
 
115
+ # Performance
 
 
 
116
 
117
+ To better evaluate the performance of the SUS-Chat-34B model, we
118
+ conducted assessments across multiple benchmark tests and have
119
+ open-sourced the evaluation framework
120
+ [TLEM](https://huggingface.co/spaces/SUSTech/tlem) to facilitate
121
+ replication and comparison by other researchers.
122
+
123
+ In TLEM, we utilized various benchmark tests including MMLU, CMMLU,
124
+ C-Eval, BBH, GSM-8K, and MATH, focusing on measuring the model’s
125
+ knowledge and thinking capabilities. In these metrics, the SUS-Chat-34B
126
+ model achieved state-of-the-art performance. Additionally, we
127
+ incorporated
128
+ [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness) to test
129
+ SUS-Chat and similar models on winogrande, hellaswag, arc, and
130
+ truthful-qa, assessing the model’s common-sense reasoning ability and
131
+ susceptibility to illusions.
132
+
133
+ Overall, the SUS-Chat-34B model significantly outperformed models of
134
+ similar scale and achieved the most advanced comprehensive performance.
135
 
136
  | model | mmlu-chat | cmmlu-chat | ceval-chat | gsm8k | BBH | MATH | winogrande | arc | hellaswag | truthfulqa | average |
137
  |:------------------|----------:|-----------:|-----------:|------:|------:|------:|-----------:|------:|----------:|-----------:|--------:|
 
144
 
145
  <img src="assets/radar.png" id="fig-bench" alt="Figure 2: Benchmark" />
146
 
147
+ # Usage
148
 
149
+ SUS-Chat-34B is a standard LLaMA model and should be seamlessly
150
+ compatible with the LLaMA ecosystem. We provide the following example to
151
+ demonstrate how it can be used for multi-turn dialogues.
152
 
153
  ``` python
154
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
195
  messages.append({"role": "assistant", "content": response})
196
  ```
197
 
198
+ # Limitations
199
+
200
+ SUS-Chat has only undergone supervised fine-tuning and has not yet been
201
+ trained on human preference learning. As a result, it may produce
202
+ unreasonable responses in some situations and exacerbate existing issues
203
+ in language models, including hallucinations, non-determinism, and
204
+ cumulative errors. To achieve better performance for downstream tasks,
205
+ we recommend adjusting the generation configuration parameters
206
+ accordingly.
207
+
208
+ # Disclaimer
209
+
210
+ During the training process, we used data compliance check algorithms to
211
+ ensure the compliance of the training model as much as possible. Due to
212
+ the complexity of the data and the diverse use cases of language models,
213
+ we cannot guarantee that the model will produce correct and reasonable
214
+ outputs in all scenarios. Please be aware that there is still a risk of
215
+ the model generating problematic outputs. We will not be responsible for
216
+ any risks or issues arising from misuse, misguidance, illegal use, and
217
+ related misinformation, as well as data security issues related to the
218
+ model.
219
+
220
+ # License
221
+
222
+ This model is developed entirely for academic research and free
223
+ commercial use, but it must adhere to the
224
+ [license](https://github.com/SUSTech-IDEA/SUS-Chat/blob/main/MODEL_LICENSE_AGREEMENT.txt)
225
+ from 01-ai.