RichardErkhov commited on
Commit
2cbf181
1 Parent(s): 4f6daa1

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +207 -0
README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ MiniChat-1.5-3B - bnb 4bits
11
+ - Model creator: https://huggingface.co/GeneZC/
12
+ - Original model: https://huggingface.co/GeneZC/MiniChat-1.5-3B/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ language:
20
+ - en
21
+ - zh
22
+ license: apache-2.0
23
+ library_name: transformers
24
+ widget:
25
+ - text: <s> [|User|] Hi 👋 </s>[|Assistant|]
26
+ model-index:
27
+ - name: MiniChat-1.5-3B
28
+ results:
29
+ - task:
30
+ type: text-generation
31
+ name: Text Generation
32
+ dataset:
33
+ name: AI2 Reasoning Challenge (25-Shot)
34
+ type: ai2_arc
35
+ config: ARC-Challenge
36
+ split: test
37
+ args:
38
+ num_few_shot: 25
39
+ metrics:
40
+ - type: acc_norm
41
+ value: 46.5
42
+ name: normalized accuracy
43
+ source:
44
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=GeneZC/MiniChat-1.5-3B
45
+ name: Open LLM Leaderboard
46
+ - task:
47
+ type: text-generation
48
+ name: Text Generation
49
+ dataset:
50
+ name: HellaSwag (10-Shot)
51
+ type: hellaswag
52
+ split: validation
53
+ args:
54
+ num_few_shot: 10
55
+ metrics:
56
+ - type: acc_norm
57
+ value: 68.28
58
+ name: normalized accuracy
59
+ source:
60
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=GeneZC/MiniChat-1.5-3B
61
+ name: Open LLM Leaderboard
62
+ - task:
63
+ type: text-generation
64
+ name: Text Generation
65
+ dataset:
66
+ name: MMLU (5-Shot)
67
+ type: cais/mmlu
68
+ config: all
69
+ split: test
70
+ args:
71
+ num_few_shot: 5
72
+ metrics:
73
+ - type: acc
74
+ value: 46.67
75
+ name: accuracy
76
+ source:
77
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=GeneZC/MiniChat-1.5-3B
78
+ name: Open LLM Leaderboard
79
+ - task:
80
+ type: text-generation
81
+ name: Text Generation
82
+ dataset:
83
+ name: TruthfulQA (0-shot)
84
+ type: truthful_qa
85
+ config: multiple_choice
86
+ split: validation
87
+ args:
88
+ num_few_shot: 0
89
+ metrics:
90
+ - type: mc2
91
+ value: 50.71
92
+ source:
93
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=GeneZC/MiniChat-1.5-3B
94
+ name: Open LLM Leaderboard
95
+ - task:
96
+ type: text-generation
97
+ name: Text Generation
98
+ dataset:
99
+ name: Winogrande (5-shot)
100
+ type: winogrande
101
+ config: winogrande_xl
102
+ split: validation
103
+ args:
104
+ num_few_shot: 5
105
+ metrics:
106
+ - type: acc
107
+ value: 65.04
108
+ name: accuracy
109
+ source:
110
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=GeneZC/MiniChat-1.5-3B
111
+ name: Open LLM Leaderboard
112
+ - task:
113
+ type: text-generation
114
+ name: Text Generation
115
+ dataset:
116
+ name: GSM8k (5-shot)
117
+ type: gsm8k
118
+ config: main
119
+ split: test
120
+ args:
121
+ num_few_shot: 5
122
+ metrics:
123
+ - type: acc
124
+ value: 24.18
125
+ name: accuracy
126
+ source:
127
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=GeneZC/MiniChat-1.5-3B
128
+ name: Open LLM Leaderboard
129
+ ---
130
+
131
+ ## MiniChat-1.5-3B
132
+
133
+ 📑 [arXiv](https://arxiv.org/abs/2311.07052) | 👻 [GitHub](https://github.com/GeneZC/MiniMA) | 🤗 [HuggingFace-MiniMA](https://huggingface.co/GeneZC/MiniMA-3B) | 🤗 [HuggingFace-MiniChat](https://huggingface.co/GeneZC/MiniChat-3B) | 🤗 [HuggingFace-MiniChat-1.5](https://huggingface.co/GeneZC/MiniChat-1.5-3B) | 🤖 [ModelScope-MiniMA](https://modelscope.cn/models/GeneZC/MiniMA-3B) | 🤖 [ModelScope-MiniChat](https://modelscope.cn/models/GeneZC/MiniChat-3B)
134
+
135
+ 🆕 **Updates from MiniChat-3B**:
136
+ - better data mixture;
137
+ - use of [NEFTune](https://arxiv.org/abs/2310.05914);
138
+ - use of [DPO](https://arxiv.org/abs/2305.18290).
139
+
140
+ ❗ Must comply with LICENSE of LLaMA2 since it is derived from LLaMA2.
141
+
142
+ A language model distilled and finetuned from an adapted version of LLaMA2-7B following "Towards the Law of Capacity Gap in Distilling Language Models".
143
+
144
+ Outperforming a wide range of 3B competitors in GPT4 evaluation and even competing with several 7B chat models.
145
+
146
+ <img src="./teaser_b.jpg" alt="teaser_b" width="687" />
147
+
148
+ The following is an example code snippet to use MiniChat-3B:
149
+
150
+ ```python
151
+ import torch
152
+
153
+ from transformers import AutoModelForCausalLM, AutoTokenizer
154
+
155
+ from conversation import get_default_conv_template
156
+
157
+ # MiniChat
158
+ tokenizer = AutoTokenizer.from_pretrained("GeneZC/MiniChat-3B", use_fast=False)
159
+ # GPU.
160
+ model = AutoModelForCausalLM.from_pretrained("GeneZC/MiniChat-3B", use_cache=True, device_map="auto", torch_dtype=torch.float16).eval()
161
+ # CPU.
162
+ # model = AutoModelForCausalLM.from_pretrained("GeneZC/MiniChat-3B", use_cache=True, device_map="cpu", torch_dtype=torch.float16).eval()
163
+
164
+ conv = get_default_conv_template("minichat")
165
+
166
+ question = "Implement a program to find the common elements in two arrays without using any extra data structures."
167
+ conv.append_message(conv.roles[0], question)
168
+ conv.append_message(conv.roles[1], None)
169
+ prompt = conv.get_prompt()
170
+ input_ids = tokenizer([prompt]).input_ids
171
+ output_ids = model.generate(
172
+ torch.as_tensor(input_ids).cuda(),
173
+ do_sample=True,
174
+ temperature=0.7,
175
+ max_new_tokens=1024,
176
+ )
177
+ output_ids = output_ids[0][len(input_ids[0]):]
178
+ output = tokenizer.decode(output_ids, skip_special_tokens=True).strip()
179
+ # output: "def common_elements(arr1, arr2):\n if len(arr1) == 0:\n return []\n if len(arr2) == 0:\n return arr1\n\n common_elements = []\n for element in arr1:\n if element in arr2:\n common_elements.append(element)\n\n return common_elements"
180
+ # Multiturn conversation could be realized by continuously appending questions to `conv`.
181
+ ```
182
+
183
+ ## Bibtex
184
+
185
+ ```bibtex
186
+ @article{zhang2023law,
187
+ title={Towards the Law of Capacity Gap in Distilling Language Models},
188
+ author={Zhang, Chen and Song, Dawei and Ye, Zheyu and Gao, Yan},
189
+ year={2023},
190
+ url={https://arxiv.org/abs/2311.07052}
191
+ }
192
+ ```
193
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
194
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_GeneZC__MiniChat-1.5-3B)
195
+
196
+ | Metric |Value|
197
+ |---------------------------------|----:|
198
+ |Avg. |50.23|
199
+ |AI2 Reasoning Challenge (25-Shot)|46.50|
200
+ |HellaSwag (10-Shot) |68.28|
201
+ |MMLU (5-Shot) |46.67|
202
+ |TruthfulQA (0-shot) |50.71|
203
+ |Winogrande (5-shot) |65.04|
204
+ |GSM8k (5-shot) |24.18|
205
+
206
+
207
+