facat commited on
Commit
bdcb689
·
1 Parent(s): d5ab356

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +203 -34
README.md CHANGED
@@ -45,7 +45,7 @@ pipeline_tag: text-generation
45
  <div style="display: inline-block;">
46
 
47
  <a rel="noopener nofollow" href="https://www.modelscope.cn/organization/sustc/">
48
- <img src="https://img.shields.io/badge/ModelScope-sustc-blue" style="margin: 0 0;">
49
  </a>
50
 
51
  </div>
@@ -78,36 +78,58 @@ pipeline_tag: text-generation
78
 
79
  # News
80
 
81
- - 2023-12-05: SUS-Chat is ranked 2nd in [Open LLM
 
 
 
 
 
 
82
  leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
83
  and surpassed all models under 70B.
84
 
85
- - 2023-12-01: SUS-Chat-34B is now avaliable on HuggingFace🤗.
 
86
 
87
- # Inrtoduction
88
 
89
  <img src="https://hackmd.io/_uploads/HJlDtzhBa.png" id="fig-sus"
90
  alt="Figure 1: DALL·E 2023-12-01 11.03.28 - An imposing, majestic wild boar combined with elements of a futuristic transformer robot. The boar itself should be intricately blended with these tra" />
91
 
92
- **SUS-Chat** is a 34B bilingual Chinese-English dialogue model, jointly
93
- released by the **Southern University of Science and Technology** and
94
- **Cognitive Computing and Natural Language Center of International
95
- Digital Economy Academy (IDEA-CCNL)**. This model is based on
96
- `01-ai/Yi-34B` and has been fine-tuned on millions of high-quality,
97
- multilingual instruction data. While maintaining the strong language
98
- capabilities of the base model, the SUS-Chat-34B model has improved the
99
- model’s response to human instructions through high-quality instruction
100
- fine-tuning and excels at imitating human thought processes through
101
- chains of thought. It introduces inter-instruction attention sharing in
102
- long texts, expanding the window size from 4K to 8K, significantly
103
- enhancing the usability of multi-round dialogues.
 
104
 
105
  It has surpassed all models of the same size in almost all benchmark
106
  tests and is better suited to meet the practical needs of complex
107
  multilingual tasks. Compared to larger models, SUS-Chat-34B remains
108
- highly competitive and achieved state-of-the-art performance in our
109
  comprehensive evaluations.
110
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
  SUS-Chat powerfully demonstrates that through the right instruction
112
  fine-tuning, academic institutions can achieve better performance
113
  without increasing model parameters, using open-source datasets and
@@ -124,10 +146,9 @@ open-sourced the evaluation framework
124
  replication and comparison by other researchers.
125
 
126
  In TLEM, we utilized various benchmark tests including MMLU, CMMLU,
127
- C-Eval, BBH, GSM-8K, and MATH, focusing on measuring the model’s
128
- knowledge and thinking capabilities. In these metrics, the SUS-Chat-34B
129
- model achieved state-of-the-art performance. Additionally, we
130
- incorporated
131
  [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness) to test
132
  SUS-Chat and similar models on winogrande, hellaswag, arc, and
133
  truthful-qa, assessing the model’s common-sense reasoning ability and
@@ -136,28 +157,176 @@ susceptibility to illusions.
136
  Overall, the SUS-Chat-34B model significantly outperformed models of
137
  similar scale and achieved the most advanced comprehensive performance.
138
 
139
- | model | mmlu-chat | cmmlu-chat | ceval-chat | gsm8k | BBH | MATH | winogrande | arc | hellaswag | truthfulqa | average |
140
- |:------------------|----------:|-----------:|-----------:|------:|------:|------:|-----------:|------:|----------:|-----------:|--------:|
141
- | GPT-4 | 83 | 71 | 69.9 | 91.4 | 86.7 | 45.8 | 87.5 | 94.5 | 91.4 | nan | 80.1333 |
142
- | SUS-Chat-34B | 77.35 | 78.68 | 82.42 | 80.06 | 67.62 | 28.8 | 81.22 | 81.54 | 83.79 | 57.47 | 71.895 |
143
- | Qwen-72B-Chat | 74.52 | 77.02 | 77.22 | 76.57 | 72.63 | 35.9 | 80.58 | 81.29 | 87.02 | 50.64 | 71.339 |
144
- | DeepSeek-67B-Chat | 69.43 | 48.51 | 59.7 | 74.45 | 69.73 | 29.56 | 76.09 | 82.1 | 86.06 | 56.37 | 65.2 |
145
- | OrionStar-34B | 68.51 | 66.88 | 65.13 | 54.36 | 62.88 | 12.8 | 77.27 | 80.19 | 84.54 | 53.24 | 62.58 |
146
- | Yi-34B-Chat | 66.96 | 55.16 | 77.16 | 63.76 | 61.54 | 10.02 | 76.64 | 70.66 | 82.29 | 54.57 | 61.876 |
147
-
148
  <img
149
  src="https://github.com/SUSTech-IDEA/SUS-Chat/raw/main/assets/radar.png"
150
  id="fig-bench" alt="Figure 2: Benchmark" />
151
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
152
  # Usage
153
 
154
  SUS-Chat-34B is a standard LLaMA model and should be seamlessly
155
  compatible with the LLaMA ecosystem. We provide the following example to
156
  demonstrate how it can be used for multi-turn dialogues.
157
 
158
- ``` python
159
- from transformers import AutoModelForCausalLM, AutoTokenizer
 
160
 
 
 
 
161
 
162
  def chat_template(messages):
163
  history = ""
@@ -230,5 +399,5 @@ model.
230
 
231
  This model is developed entirely for academic research and free
232
  commercial use, but it must adhere to the
233
- [license](https://github.com/SUSTech-IDEA/SUS-Chat/blob/main/MODEL_LICENSE_AGREEMENT.txt)
234
- from 01-ai.
 
45
  <div style="display: inline-block;">
46
 
47
  <a rel="noopener nofollow" href="https://www.modelscope.cn/organization/sustc/">
48
+ <img src="https://img.shields.io/badge/🤖ModelScope-sustc-blue" style="margin: 0 0;">
49
  </a>
50
 
51
  </div>
 
78
 
79
  # News
80
 
81
+ - 2023-12-06: Try [SUS-Chat-34B
82
+ chat-ui](https://huggingface.co/spaces/SUSTech/SUS-Chat-34B).
83
+
84
+ - 2023-12-05: SUS-Chat-34B is now available on
85
+ [ModelScope🤖](https://www.modelscope.cn/models/SUSTC/SUS-Chat-34B/summary)
86
+
87
+ - 2023-12-05: SUS-Chat-34B is ranked 2nd in [Open LLM
88
  leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
89
  and surpassed all models under 70B.
90
 
91
+ - 2023-12-01: SUS-Chat-34B is now available on
92
+ [HuggingFace🤗](https://huggingface.co/SUSTech/SUS-Chat-34B).
93
 
94
+ # Introduction
95
 
96
  <img src="https://hackmd.io/_uploads/HJlDtzhBa.png" id="fig-sus"
97
  alt="Figure 1: DALL·E 2023-12-01 11.03.28 - An imposing, majestic wild boar combined with elements of a futuristic transformer robot. The boar itself should be intricately blended with these tra" />
98
 
99
+ **SUS-Chat-34B** is a 34B bilingual Chinese-English dialogue model,
100
+ jointly released by the **[Southern University of Science and
101
+ Technology](https://huggingface.co/SUSTech)** and
102
+ **[IDEA-CCNL](https://huggingface.co/IDEA-CCNL)**. This model is based
103
+ on [`01-ai/Yi-34B`](https://huggingface.co/01-ai/Yi-34B) and has been
104
+ fine-tuned on millions of high-quality, multilingual instruction data.
105
+ While maintaining the strong language capabilities of the base model,
106
+ the SUS-Chat-34B model has improved the model’s response to human
107
+ instructions through high-quality instruction fine-tuning and excels at
108
+ imitating human thought processes through chains of thought. It
109
+ introduces inter-instruction attention sharing in long texts, expanding
110
+ the window size from 4K to 8K, significantly enhancing the usability of
111
+ multi-turn dialogues.
112
 
113
  It has surpassed all models of the same size in almost all benchmark
114
  tests and is better suited to meet the practical needs of complex
115
  multilingual tasks. Compared to larger models, SUS-Chat-34B remains
116
+ highly competitive and has achieved state-of-the-art performance in our
117
  comprehensive evaluations.
118
 
119
+ SUS-Chat-34B model has the following highlights: 1. Large-scale complex
120
+ instruction following data: Trained with 1.4 billion tokens of
121
+ high-quality complex instruction data, covering Chinese and English,
122
+ multi-turn dialogues, mathematics, reasoning, and various other types of
123
+ instruction data; 2. Strong performance in general tasks: The
124
+ SUS-Chat-34B model excels in numerous mainstream Chinese and English
125
+ tasks, surpassing other open-source instruction fine-tuned models of the
126
+ same parameter scale. It also competes well against models with larger
127
+ parameter scales; 3. Longer context window and excellent multi-turn
128
+ dialogue capabilities: Currently, SUS-Chat-34B supports an 8K context
129
+ window, and is trained with a large amount of multi-turn instruction and
130
+ single-multi-turn mixed data, demonstrating remarkable capabilities in
131
+ long-text dialogue information focus and instruction follow-up.
132
+
133
  SUS-Chat powerfully demonstrates that through the right instruction
134
  fine-tuning, academic institutions can achieve better performance
135
  without increasing model parameters, using open-source datasets and
 
146
  replication and comparison by other researchers.
147
 
148
  In TLEM, we utilized various benchmark tests including MMLU, CMMLU,
149
+ C-Eval, BBH, GSM-8K, and MATH, to measure the model’s knowledge and
150
+ thinking capabilities. In these metrics, the SUS-Chat-34B model achieved
151
+ state-of-the-art performance. Additionally, we incorporated
 
152
  [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness) to test
153
  SUS-Chat and similar models on winogrande, hellaswag, arc, and
154
  truthful-qa, assessing the model’s common-sense reasoning ability and
 
157
  Overall, the SUS-Chat-34B model significantly outperformed models of
158
  similar scale and achieved the most advanced comprehensive performance.
159
 
 
 
 
 
 
 
 
 
 
160
  <img
161
  src="https://github.com/SUSTech-IDEA/SUS-Chat/raw/main/assets/radar.png"
162
  id="fig-bench" alt="Figure 2: Benchmark" />
163
 
164
+ <div>
165
+
166
+ <table>
167
+ <colgroup>
168
+ <col style="width: 50%" />
169
+ <col style="width: 50%" />
170
+ </colgroup>
171
+ <tbody>
172
+ <tr class="odd">
173
+ <td style="text-align: center;"><div width="50.0%"
174
+ data-layout-align="center">
175
+ <h2 id="english-understanding">English Understanding</h2>
176
+ <table>
177
+ <thead>
178
+ <tr class="header">
179
+ <th style="text-align: right;">Model</th>
180
+ <th style="text-align: center;">mmlu (0-shot)</th>
181
+ </tr>
182
+ </thead>
183
+ <tbody>
184
+ <tr class="odd">
185
+ <td style="text-align: right;">GPT-4</td>
186
+ <td style="text-align: center;">83</td>
187
+ </tr>
188
+ <tr class="even">
189
+ <td style="text-align: right;">SUS-Chat-34B</td>
190
+ <td style="text-align: center;"><span
191
+ class="math inline">$\underline{74.35}$</span></td>
192
+ </tr>
193
+ <tr class="odd">
194
+ <td style="text-align: right;">Qwen-72b-Chat</td>
195
+ <td style="text-align: center;"><strong>74.52</strong></td>
196
+ </tr>
197
+ <tr class="even">
198
+ <td style="text-align: right;">Deepseek-68b-Chat</td>
199
+ <td style="text-align: center;">69.43</td>
200
+ </tr>
201
+ <tr class="odd">
202
+ <td style="text-align: right;">OrionStar-Yi-34B-Chat</td>
203
+ <td style="text-align: center;">68.51</td>
204
+ </tr>
205
+ <tr class="even">
206
+ <td style="text-align: right;">Yi-34B-Chat</td>
207
+ <td style="text-align: center;">66.96</td>
208
+ </tr>
209
+ </tbody>
210
+ </table>
211
+ </div></td>
212
+ <td style="text-align: center;"><div width="50.0%"
213
+ data-layout-align="center">
214
+ <h2 id="chinese-capabilities">Chinese Capabilities</h2>
215
+ <table>
216
+ <colgroup>
217
+ <col style="width: 34%" />
218
+ <col style="width: 32%" />
219
+ <col style="width: 32%" />
220
+ </colgroup>
221
+ <thead>
222
+ <tr class="header">
223
+ <th style="text-align: right;">Model</th>
224
+ <th style="text-align: center;">cmmlu (0-shot)</th>
225
+ <th style="text-align: center;">C-Eval (0-shot)<a href="#fn1"
226
+ class="footnote-ref" id="fnref1"
227
+ role="doc-noteref"><sup>1</sup></a></th>
228
+ </tr>
229
+ </thead>
230
+ <tbody>
231
+ <tr class="odd">
232
+ <td style="text-align: right;">GPT-4</td>
233
+ <td style="text-align: center;">71</td>
234
+ <td style="text-align: center;">69.9</td>
235
+ </tr>
236
+ <tr class="even">
237
+ <td style="text-align: right;">SUS-Chat-34B</td>
238
+ <td style="text-align: center;"><strong>78.68</strong></td>
239
+ <td style="text-align: center;"><strong>82.42</strong></td>
240
+ </tr>
241
+ <tr class="odd">
242
+ <td style="text-align: right;">Qwen-72b-Chat</td>
243
+ <td style="text-align: center;"><span
244
+ class="math inline">$\underline{77.02}$</span></td>
245
+ <td style="text-align: center;"><span
246
+ class="math inline">$\underline{77.22}$</span></td>
247
+ </tr>
248
+ <tr class="even">
249
+ <td style="text-align: right;">Deepseek-68b-Chat</td>
250
+ <td style="text-align: center;">48.51</td>
251
+ <td style="text-align: center;">59.7</td>
252
+ </tr>
253
+ <tr class="odd">
254
+ <td style="text-align: right;">OrionStar-Yi-34B-Chat</td>
255
+ <td style="text-align: center;">66.88</td>
256
+ <td style="text-align: center;">65.13</td>
257
+ </tr>
258
+ <tr class="even">
259
+ <td style="text-align: right;">Yi-34B-Chat</td>
260
+ <td style="text-align: center;">55.16</td>
261
+ <td style="text-align: center;">77.16</td>
262
+ </tr>
263
+ </tbody>
264
+ </table>
265
+ </div></td>
266
+ </tr>
267
+ </tbody>
268
+ </table>
269
+ <section id="footnotes" class="footnotes footnotes-end-of-document"
270
+ role="doc-endnotes">
271
+ <hr />
272
+ <ol>
273
+ <li id="fn1"><p>C-Eval results are evaluated on the validation
274
+ datasets<a href="#fnref1" class="footnote-back"
275
+ role="doc-backlink">↩︎</a></p></li>
276
+ </ol>
277
+ </section>
278
+
279
+ </div>
280
+
281
+ ## Math & Reasoning
282
+
283
+ | Model | gsm8k (0-shot) | MATH (0-shot) | BBH (0-shot) |
284
+ |----------------------:|:-------------------:|:-------------------:|:-------------------:|
285
+ | GPT-4 | 91.4 | 45.8 | 86.7 |
286
+ | SUS-Chat-34B | **80.06** | 28.7 | 67.62 |
287
+ | Qwen-72b-Chat | $\underline{76.57}$ | **35.9** | **72.63** |
288
+ | Deepseek-68b-Chat | 74.45 | $\underline{29.56}$ | $\underline{69.73}$ |
289
+ | OrionStar-Yi-34B-Chat | 54.36 | 12.8 | 62.88 |
290
+ | Yi-34B-Chat | 63.76 | 10.02 | 61.54 |
291
+
292
+ ## More Tasks
293
+
294
+ | Model | winogrande (5-shot) | arc (25-shot) | hellaswag (10-shot) | TruthfulQA mc1 (0-shot) | TruthfulQA mc2 (0-shot) |
295
+ |----------------------:|:-------------------:|:-------------------:|:-------------------:|:-----------------------:|:-----------------------:|
296
+ | GPT-4 | — | 94.5 | 91.4 | 59.00 | — |
297
+ | SUS-Chat-34B | **81.22** | $\underline{81.54}$ | 83.79 | **40.64** | **57.47** |
298
+ | Qwen-72b-Chat | 76.09 | **82.10** | $\underline{86.06}$ | 39.17 | $\underline{56.37}$ |
299
+ | Deepseek-68b-Chat | $\underline{80.58}$ | 81.29 | **87.02** | $\underline{40.02}$ | 50.64 |
300
+ | OrionStar-Yi-34B-Chat | 77.27 | 80.19 | 84.54 | 36.47 | 53.24 |
301
+ | Yi-34B-Chat | 76.64 | 70.66 | 82.29 | 38.19 | 54.57 |
302
+
303
+ ## Overall
304
+
305
+ | Model | Average |
306
+ |----------------------:|:---------:|
307
+ | SUS-Chat-34B | **69.05** |
308
+ | Qwen-72b-Chat | 68.41 |
309
+ | Deepseek-68b-Chat | 62.91 |
310
+ | OrionStar-Yi-34B-Chat | 60.21 |
311
+ | Yi-34B-Chat | 59.72 |
312
+
313
+ To reproduce the results, please start a corresponding vllm server and
314
+ refer to
315
+ [here](https://sustech-tlem.static.hf.space/index.html#start-evaluating-your-model-in-3-line).
316
+
317
  # Usage
318
 
319
  SUS-Chat-34B is a standard LLaMA model and should be seamlessly
320
  compatible with the LLaMA ecosystem. We provide the following example to
321
  demonstrate how it can be used for multi-turn dialogues.
322
 
323
+ Feel free to [open an
324
+ issue](https://github.com/SUSTech-IDEA/SUS-Chat/issues) if you have any
325
+ questions.
326
 
327
+ ``` python
328
+ from transformers import AutoModelForCausalLM, AutoTokenizer # 🤗 Transformers, or
329
+ # from modelscope import AutoModelForCausalLM, AutoTokenizer # 🤖 ModelScope
330
 
331
  def chat_template(messages):
332
  history = ""
 
399
 
400
  This model is developed entirely for academic research and free
401
  commercial use, but it must adhere to the
402
+ [license](https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt)
403
+ from [01-ai](https://huggingface.co/01-ai).