RichardErkhov commited on
Commit
925f1dd
1 Parent(s): 2a7d490

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +246 -0
README.md ADDED
@@ -0,0 +1,246 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ MiniChat-2-3B - bnb 4bits
11
+ - Model creator: https://huggingface.co/GeneZC/
12
+ - Original model: https://huggingface.co/GeneZC/MiniChat-2-3B/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ language:
20
+ - en
21
+ - zh
22
+ license: apache-2.0
23
+ library_name: transformers
24
+ widget:
25
+ - text: <s> [|User|] Hi 👋 </s>[|Assistant|]
26
+ model-index:
27
+ - name: MiniChat-2-3B
28
+ results:
29
+ - task:
30
+ type: text-generation
31
+ name: Text Generation
32
+ dataset:
33
+ name: AI2 Reasoning Challenge (25-Shot)
34
+ type: ai2_arc
35
+ config: ARC-Challenge
36
+ split: test
37
+ args:
38
+ num_few_shot: 25
39
+ metrics:
40
+ - type: acc_norm
41
+ value: 44.88
42
+ name: normalized accuracy
43
+ source:
44
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=GeneZC/MiniChat-2-3B
45
+ name: Open LLM Leaderboard
46
+ - task:
47
+ type: text-generation
48
+ name: Text Generation
49
+ dataset:
50
+ name: HellaSwag (10-Shot)
51
+ type: hellaswag
52
+ split: validation
53
+ args:
54
+ num_few_shot: 10
55
+ metrics:
56
+ - type: acc_norm
57
+ value: 67.69
58
+ name: normalized accuracy
59
+ source:
60
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=GeneZC/MiniChat-2-3B
61
+ name: Open LLM Leaderboard
62
+ - task:
63
+ type: text-generation
64
+ name: Text Generation
65
+ dataset:
66
+ name: MMLU (5-Shot)
67
+ type: cais/mmlu
68
+ config: all
69
+ split: test
70
+ args:
71
+ num_few_shot: 5
72
+ metrics:
73
+ - type: acc
74
+ value: 47.59
75
+ name: accuracy
76
+ source:
77
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=GeneZC/MiniChat-2-3B
78
+ name: Open LLM Leaderboard
79
+ - task:
80
+ type: text-generation
81
+ name: Text Generation
82
+ dataset:
83
+ name: TruthfulQA (0-shot)
84
+ type: truthful_qa
85
+ config: multiple_choice
86
+ split: validation
87
+ args:
88
+ num_few_shot: 0
89
+ metrics:
90
+ - type: mc2
91
+ value: 49.64
92
+ source:
93
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=GeneZC/MiniChat-2-3B
94
+ name: Open LLM Leaderboard
95
+ - task:
96
+ type: text-generation
97
+ name: Text Generation
98
+ dataset:
99
+ name: Winogrande (5-shot)
100
+ type: winogrande
101
+ config: winogrande_xl
102
+ split: validation
103
+ args:
104
+ num_few_shot: 5
105
+ metrics:
106
+ - type: acc
107
+ value: 66.46
108
+ name: accuracy
109
+ source:
110
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=GeneZC/MiniChat-2-3B
111
+ name: Open LLM Leaderboard
112
+ - task:
113
+ type: text-generation
114
+ name: Text Generation
115
+ dataset:
116
+ name: GSM8k (5-shot)
117
+ type: gsm8k
118
+ config: main
119
+ split: test
120
+ args:
121
+ num_few_shot: 5
122
+ metrics:
123
+ - type: acc
124
+ value: 32.68
125
+ name: accuracy
126
+ source:
127
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=GeneZC/MiniChat-2-3B
128
+ name: Open LLM Leaderboard
129
+ ---
130
+
131
+ ## MiniChat-2-3B
132
+
133
+ 📑 [arXiv](https://arxiv.org/abs/2311.07052) | 👻 [GitHub](https://github.com/GeneZC/MiniMA) | 🤗 [HuggingFace-MiniMA](https://huggingface.co/GeneZC/MiniMA-3B) | 🤗 [HuggingFace-MiniChat](https://huggingface.co/GeneZC/MiniChat-3B) | 🤖 [ModelScope-MiniMA](https://modelscope.cn/models/GeneZC/MiniMA-3B) | 🤖 [ModelScope-MiniChat](https://modelscope.cn/models/GeneZC/MiniChat-3B) | 🤗 [HuggingFace-MiniChat-1.5](https://huggingface.co/GeneZC/MiniChat-1.5-3B) | 🤗 [HuggingFace-MiniMA-2](https://huggingface.co/GeneZC/MiniMA-2-3B) | 🤗 [HuggingFace-MiniChat-2](https://huggingface.co/GeneZC/MiniChat-2-3B)
134
+
135
+ 🆕 **Updates from MiniChat-3B**:
136
+ - better base model MiniMA-2-3B;
137
+ - better data mixture;
138
+ - use of [NEFTune](https://arxiv.org/abs/2310.05914);
139
+ - use of [DPO](https://arxiv.org/abs/2305.18290).
140
+
141
+ ❗ Must comply with LICENSE of LLaMA2 since it is derived from LLaMA2.
142
+
143
+ A language model continued from MiniMA-3B and finetuned on both instruction and preference data.
144
+
145
+ Surpassing Vicuna-7B and approximating LLaMA-2-Chat-7B on MT-Bench.
146
+
147
+ <img src="./teaser_b.jpg" alt="teaser_b" width="687" />
148
+
149
+ **Standard Benchmarks**
150
+
151
+ |Method|TFLOPs|MMLU (5-shot)|CEval (5-shot)|DROP (3-shot)|HumanEval (0-shot)|BBH (3-shot)|GSM8K (8-shot)|
152
+ |--|--|--|--|--|--|--|--|
153
+ |Mamba-2.8B|4.6E9|25.58|24.74|15.72|7.32|29.37|3.49|
154
+ |ShearedLLaMA-2.7B|0.8E9|26.97|22.88|19.98|4.88|30.48|3.56|
155
+ |BTLM-3B|11.3E9|27.20|26.00|17.84|10.98|30.87|4.55|
156
+ |StableLM-3B|72.0E9|44.75|31.05|22.35|15.85|32.59|10.99|
157
+ |Qwen-1.8B|23.8E9|44.05|54.75|12.97|14.02|30.80|22.97|
158
+ |Phi-2-2.8B|159.9E9|56.74|34.03|30.74|46.95|44.13|55.42|
159
+ |LLaMA-2-7B|84.0E9|46.00|34.40|31.57|12.80|32.02|14.10|
160
+ ||
161
+ |MiniMA-3B|4.0E9|28.51|28.23|22.50|10.98|31.61|8.11|
162
+ |MiniChat-3B|4.0E9|38.40|36.48|22.58|18.29|31.36|29.72|
163
+ |MiniMA-2-3B|13.4E9|40.14|44.65|23.10|14.63|31.43|8.87|
164
+ |MiniChat-2-3B|13.4E9|46.17|43.91|30.26|22.56|34.95|38.13|
165
+
166
+ **Instruction-following Benchmarks**
167
+
168
+ |Method|AlpacaEval|MT-Bench|MT-Bench-ZH|
169
+ |--|--|--|--|
170
+ |GPT-4|95.28|9.18|8.96|
171
+ |Zephyr-7B-Beta|90.60|7.34|6.27<sup>#</sup>|
172
+ |Vicuna-7B|76.84|6.17|5.22<sup>#</sup>|
173
+ |LLaMA-2-Chat-7B|71.37|6.27|5.43<sup>#</sup>|
174
+ |Qwen-Chat-7B|-|-|6.24|
175
+ |Phi-2-DPO|81.37|-|1.59<sup>#</sup><sup>$</sup>|
176
+ |StableLM-Zephyr-3B|76.00|6.64|4.31<sup>#</sup>|
177
+ |Rocket-3B|79.75|6.56|4.07<sup>#</sup>|
178
+ |Qwen-Chat-1.8B|-|-|5.65|
179
+ ||
180
+ |MiniChat-3B|48.82|-|-|
181
+ |MiniChat-2-3B|77.30|6.23|6.04|
182
+
183
+ <sup>#</sup> specialized mainly for English.
184
+
185
+ <sup>$</sup> finetuned without multi-turn instruction data.
186
+
187
+ The following is an example code snippet to use MiniChat-2-3B:
188
+
189
+ ```python
190
+ import torch
191
+
192
+ from transformers import AutoModelForCausalLM, AutoTokenizer
193
+
194
+ from conversation import get_default_conv_template
195
+
196
+ # MiniChat
197
+ tokenizer = AutoTokenizer.from_pretrained("GeneZC/MiniChat-2-3B", use_fast=False)
198
+ # GPU.
199
+ model = AutoModelForCausalLM.from_pretrained("GeneZC/MiniChat-2-3B", use_cache=True, device_map="auto", torch_dtype=torch.float16).eval()
200
+ # CPU.
201
+ # model = AutoModelForCausalLM.from_pretrained("GeneZC/MiniChat-2-3B", use_cache=True, device_map="cpu", torch_dtype=torch.float16).eval()
202
+
203
+ conv = get_default_conv_template("minichat")
204
+
205
+ question = "Implement a program to find the common elements in two arrays without using any extra data structures."
206
+ conv.append_message(conv.roles[0], question)
207
+ conv.append_message(conv.roles[1], None)
208
+ prompt = conv.get_prompt()
209
+ input_ids = tokenizer([prompt]).input_ids
210
+ output_ids = model.generate(
211
+ torch.as_tensor(input_ids).cuda(),
212
+ do_sample=True,
213
+ temperature=0.7,
214
+ max_new_tokens=1024,
215
+ )
216
+ output_ids = output_ids[0][len(input_ids[0]):]
217
+ output = tokenizer.decode(output_ids, skip_special_tokens=True).strip()
218
+ # output: "def common_elements(arr1, arr2):\n if len(arr1) == 0:\n return []\n if len(arr2) == 0:\n return arr1\n\n common_elements = []\n for element in arr1:\n if element in arr2:\n common_elements.append(element)\n\n return common_elements"
219
+ # Multiturn conversation could be realized by continuously appending questions to `conv`.
220
+ ```
221
+
222
+ ## Bibtex
223
+
224
+ ```bibtex
225
+ @article{zhang2023law,
226
+ title={Towards the Law of Capacity Gap in Distilling Language Models},
227
+ author={Zhang, Chen and Song, Dawei and Ye, Zheyu and Gao, Yan},
228
+ year={2023},
229
+ url={https://arxiv.org/abs/2311.07052}
230
+ }
231
+ ```
232
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
233
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_GeneZC__MiniChat-2-3B)
234
+
235
+ | Metric |Value|
236
+ |---------------------------------|----:|
237
+ |Avg. |51.49|
238
+ |AI2 Reasoning Challenge (25-Shot)|44.88|
239
+ |HellaSwag (10-Shot) |67.69|
240
+ |MMLU (5-Shot) |47.59|
241
+ |TruthfulQA (0-shot) |49.64|
242
+ |Winogrande (5-shot) |66.46|
243
+ |GSM8k (5-shot) |32.68|
244
+
245
+
246
+