Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,13 @@ widget:
|
|
11 |
pipeline_tag: text-generation
|
12 |
---
|
13 |
|
14 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
<div align="center">
|
17 |
|
@@ -53,7 +59,7 @@ pipeline_tag: text-generation
|
|
53 |
|
54 |
<div style="display: inline-block;">
|
55 |
|
56 |
-
<a rel="noopener nofollow" href="https://github.com/
|
57 |
<img src="https://img.shields.io/badge/Model_License-Model_Agreement-lightblue" style="margin: 0 0;">
|
58 |
</a>
|
59 |
|
@@ -69,29 +75,63 @@ pipeline_tag: text-generation
|
|
69 |
|
70 |
</div>
|
71 |
|
72 |
-
#
|
73 |
-
|
74 |
-
<img src="https://hackmd.io/_uploads/S1dXCTIHp.png" id="fig-sus"
|
75 |
-
alt="Figure 1: DALL·E 2023-12-01 11.03.28 - An imposing, majestic wild boar combined with elements of a futuristic transformer robot. The boar itself should be intricately blended with these tra" />
|
76 |
|
77 |
-
|
78 |
-
|
79 |
|
80 |
-
|
81 |
|
82 |
-
|
83 |
-
这弥合了学术界和工业界的在大语言模型上的差距,为学术界和工业界的合作提供了新的可能性。
|
84 |
|
85 |
-
|
|
|
86 |
|
87 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
88 |
|
89 |
-
|
90 |
-
GSM-8K, MATH,
|
91 |
-
专注于衡量模型的知识和思维能力,在这些指标中SUS-Chat-34B模型取得了最先进的表现,我们还额外引入了[lm-eval](https://github.com/EleutherAI/lm-evaluation-harness)测试了SUS-Chat和同类模型在winogrande,
|
92 |
-
hellaswag, arc, truthful-qa的表现, 衡量模型的常识性推理能力和幻觉。
|
93 |
|
94 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
95 |
|
96 |
| model | mmlu-chat | cmmlu-chat | ceval-chat | gsm8k | BBH | MATH | winogrande | arc | hellaswag | truthfulqa | average |
|
97 |
|:------------------|----------:|-----------:|-----------:|------:|------:|------:|-----------:|------:|----------:|-----------:|--------:|
|
@@ -104,9 +144,11 @@ hellaswag, arc, truthful-qa的表现, 衡量模型的常识性推理能力和幻
|
|
104 |
|
105 |
<img src="assets/radar.png" id="fig-bench" alt="Figure 2: Benchmark" />
|
106 |
|
107 |
-
#
|
108 |
|
109 |
-
SUS-Chat-34B
|
|
|
|
|
110 |
|
111 |
``` python
|
112 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
@@ -153,16 +195,31 @@ response = tokenizer.decode(
|
|
153 |
messages.append({"role": "assistant", "content": response})
|
154 |
```
|
155 |
|
156 |
-
#
|
157 |
-
|
158 |
-
SUS-Chat
|
159 |
-
|
160 |
-
|
161 |
-
|
162 |
-
|
163 |
-
|
164 |
-
|
165 |
-
|
166 |
-
#
|
167 |
-
|
168 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
pipeline_tag: text-generation
|
12 |
---
|
13 |
|
14 |
+
# 🐷SUS-Chat: Instruction tuning done right
|
15 |
+
|
16 |
+
<p align="left">
|
17 |
+
<a href="README_CN.md">中文</a>  |  English 
|
18 |
+
</p>
|
19 |
+
|
20 |
+
<br><br>
|
21 |
|
22 |
<div align="center">
|
23 |
|
|
|
59 |
|
60 |
<div style="display: inline-block;">
|
61 |
|
62 |
+
<a rel="noopener nofollow" href="https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt">
|
63 |
<img src="https://img.shields.io/badge/Model_License-Model_Agreement-lightblue" style="margin: 0 0;">
|
64 |
</a>
|
65 |
|
|
|
75 |
|
76 |
</div>
|
77 |
|
78 |
+
# News
|
|
|
|
|
|
|
79 |
|
80 |
+
- 2023-12-05: SUS-Chat is ranked 2nd in [Open LLM
|
81 |
+
leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
82 |
|
83 |
+
- 2023-12-01: SUS-Chat-34B is now avaliable on HuggingFace🤗.
|
84 |
|
85 |
+
# Inrtoduction
|
|
|
86 |
|
87 |
+
<img src="https://hackmd.io/_uploads/HJlDtzhBa.png" id="fig-sus"
|
88 |
+
alt="Figure 1: DALL·E 2023-12-01 11.03.28 - An imposing, majestic wild boar combined with elements of a futuristic transformer robot. The boar itself should be intricately blended with these tra" />
|
89 |
|
90 |
+
**SUS-Chat** is a 34B bilingual Chinese-English dialogue model, jointly
|
91 |
+
released by the **Southern University of Science and Technology** and
|
92 |
+
**International Digital Economy Academy**. The SUS-Chat-34B model has
|
93 |
+
been fine-tuned on millions of high-quality, multilingual instruction
|
94 |
+
data. While maintaining the strong language capabilities of the base
|
95 |
+
model, the SUS-Chat-34B model has improved the model’s response to human
|
96 |
+
instructions through high-quality instruction fine-tuning and excels at
|
97 |
+
imitating human thought processes through chains of thought. It
|
98 |
+
introduces inter-instruction attention sharing in long texts, expanding
|
99 |
+
the window length from 4K to 8K, significantly enhancing the usability
|
100 |
+
of multi-round dialogues.
|
101 |
+
|
102 |
+
It has surpassed all models of the same size in almost all benchmark
|
103 |
+
tests and is better suited to meet the practical needs of complex
|
104 |
+
multilingual tasks. Compared to larger models, SUS-Chat-34B remains
|
105 |
+
highly competitive and achieved state-of-the-art performance in our
|
106 |
+
comprehensive evaluations.
|
107 |
+
|
108 |
+
SUS-Chat powerfully demonstrates that through the right instruction
|
109 |
+
fine-tuning, academic institutions can achieve better performance
|
110 |
+
without increasing model parameters, using open-source datasets and
|
111 |
+
models. This bridges the gap between academia and industry in large
|
112 |
+
language models and opens new possibilities for collaboration between
|
113 |
+
academic and industrial sectors.
|
114 |
|
115 |
+
# Performance
|
|
|
|
|
|
|
116 |
|
117 |
+
To better evaluate the performance of the SUS-Chat-34B model, we
|
118 |
+
conducted assessments across multiple benchmark tests and have
|
119 |
+
open-sourced the evaluation framework
|
120 |
+
[TLEM](https://huggingface.co/spaces/SUSTech/tlem) to facilitate
|
121 |
+
replication and comparison by other researchers.
|
122 |
+
|
123 |
+
In TLEM, we utilized various benchmark tests including MMLU, CMMLU,
|
124 |
+
C-Eval, BBH, GSM-8K, and MATH, focusing on measuring the model’s
|
125 |
+
knowledge and thinking capabilities. In these metrics, the SUS-Chat-34B
|
126 |
+
model achieved state-of-the-art performance. Additionally, we
|
127 |
+
incorporated
|
128 |
+
[lm-eval](https://github.com/EleutherAI/lm-evaluation-harness) to test
|
129 |
+
SUS-Chat and similar models on winogrande, hellaswag, arc, and
|
130 |
+
truthful-qa, assessing the model’s common-sense reasoning ability and
|
131 |
+
susceptibility to illusions.
|
132 |
+
|
133 |
+
Overall, the SUS-Chat-34B model significantly outperformed models of
|
134 |
+
similar scale and achieved the most advanced comprehensive performance.
|
135 |
|
136 |
| model | mmlu-chat | cmmlu-chat | ceval-chat | gsm8k | BBH | MATH | winogrande | arc | hellaswag | truthfulqa | average |
|
137 |
|:------------------|----------:|-----------:|-----------:|------:|------:|------:|-----------:|------:|----------:|-----------:|--------:|
|
|
|
144 |
|
145 |
<img src="assets/radar.png" id="fig-bench" alt="Figure 2: Benchmark" />
|
146 |
|
147 |
+
# Usage
|
148 |
|
149 |
+
SUS-Chat-34B is a standard LLaMA model and should be seamlessly
|
150 |
+
compatible with the LLaMA ecosystem. We provide the following example to
|
151 |
+
demonstrate how it can be used for multi-turn dialogues.
|
152 |
|
153 |
``` python
|
154 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
195 |
messages.append({"role": "assistant", "content": response})
|
196 |
```
|
197 |
|
198 |
+
# Limitations
|
199 |
+
|
200 |
+
SUS-Chat has only undergone supervised fine-tuning and has not yet been
|
201 |
+
trained on human preference learning. As a result, it may produce
|
202 |
+
unreasonable responses in some situations and exacerbate existing issues
|
203 |
+
in language models, including hallucinations, non-determinism, and
|
204 |
+
cumulative errors. To achieve better performance for downstream tasks,
|
205 |
+
we recommend adjusting the generation configuration parameters
|
206 |
+
accordingly.
|
207 |
+
|
208 |
+
# Disclaimer
|
209 |
+
|
210 |
+
During the training process, we used data compliance check algorithms to
|
211 |
+
ensure the compliance of the training model as much as possible. Due to
|
212 |
+
the complexity of the data and the diverse use cases of language models,
|
213 |
+
we cannot guarantee that the model will produce correct and reasonable
|
214 |
+
outputs in all scenarios. Please be aware that there is still a risk of
|
215 |
+
the model generating problematic outputs. We will not be responsible for
|
216 |
+
any risks or issues arising from misuse, misguidance, illegal use, and
|
217 |
+
related misinformation, as well as data security issues related to the
|
218 |
+
model.
|
219 |
+
|
220 |
+
# License
|
221 |
+
|
222 |
+
This model is developed entirely for academic research and free
|
223 |
+
commercial use, but it must adhere to the
|
224 |
+
[license](https://github.com/SUSTech-IDEA/SUS-Chat/blob/main/MODEL_LICENSE_AGREEMENT.txt)
|
225 |
+
from 01-ai.
|