RichardErkhov commited on
Commit
93d58b3
1 Parent(s): 1b20d33

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +225 -0
README.md ADDED
@@ -0,0 +1,225 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ firefly-qwen1.5-en-7b-unsloth - GGUF
11
+ - Model creator: https://huggingface.co/YeungNLP/
12
+ - Original model: https://huggingface.co/YeungNLP/firefly-qwen1.5-en-7b-unsloth/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [firefly-qwen1.5-en-7b-unsloth.Q2_K.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.Q2_K.gguf) | Q2_K | 2.89GB |
18
+ | [firefly-qwen1.5-en-7b-unsloth.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.IQ3_XS.gguf) | IQ3_XS | 3.18GB |
19
+ | [firefly-qwen1.5-en-7b-unsloth.IQ3_S.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.IQ3_S.gguf) | IQ3_S | 3.32GB |
20
+ | [firefly-qwen1.5-en-7b-unsloth.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.Q3_K_S.gguf) | Q3_K_S | 3.32GB |
21
+ | [firefly-qwen1.5-en-7b-unsloth.IQ3_M.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.IQ3_M.gguf) | IQ3_M | 3.48GB |
22
+ | [firefly-qwen1.5-en-7b-unsloth.Q3_K.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.Q3_K.gguf) | Q3_K | 3.65GB |
23
+ | [firefly-qwen1.5-en-7b-unsloth.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.Q3_K_M.gguf) | Q3_K_M | 3.65GB |
24
+ | [firefly-qwen1.5-en-7b-unsloth.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.Q3_K_L.gguf) | Q3_K_L | 3.93GB |
25
+ | [firefly-qwen1.5-en-7b-unsloth.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.IQ4_XS.gguf) | IQ4_XS | 4.02GB |
26
+ | [firefly-qwen1.5-en-7b-unsloth.Q4_0.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.Q4_0.gguf) | Q4_0 | 4.2GB |
27
+ | [firefly-qwen1.5-en-7b-unsloth.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.IQ4_NL.gguf) | IQ4_NL | 4.22GB |
28
+ | [firefly-qwen1.5-en-7b-unsloth.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.Q4_K_S.gguf) | Q4_K_S | 4.23GB |
29
+ | [firefly-qwen1.5-en-7b-unsloth.Q4_K.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.Q4_K.gguf) | Q4_K | 3.95GB |
30
+ | [firefly-qwen1.5-en-7b-unsloth.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.Q4_K_M.gguf) | Q4_K_M | 0.95GB |
31
+ | [firefly-qwen1.5-en-7b-unsloth.Q4_1.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.Q4_1.gguf) | Q4_1 | 0.01GB |
32
+ | [firefly-qwen1.5-en-7b-unsloth.Q5_0.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.Q5_0.gguf) | Q5_0 | 0.01GB |
33
+ | [firefly-qwen1.5-en-7b-unsloth.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.Q5_K_S.gguf) | Q5_K_S | 0.01GB |
34
+ | [firefly-qwen1.5-en-7b-unsloth.Q5_K.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.Q5_K.gguf) | Q5_K | 0.01GB |
35
+ | [firefly-qwen1.5-en-7b-unsloth.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.Q5_K_M.gguf) | Q5_K_M | 0.01GB |
36
+ | [firefly-qwen1.5-en-7b-unsloth.Q5_1.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.Q5_1.gguf) | Q5_1 | 0.01GB |
37
+ | [firefly-qwen1.5-en-7b-unsloth.Q6_K.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.Q6_K.gguf) | Q6_K | 0.01GB |
38
+ | [firefly-qwen1.5-en-7b-unsloth.Q8_0.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-unsloth-gguf/blob/main/firefly-qwen1.5-en-7b-unsloth.Q8_0.gguf) | Q8_0 | 0.01GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ library_name: transformers
46
+ license: apache-2.0
47
+ basemodel: Qwen/Qwen1.5-7B
48
+ ---
49
+
50
+ ## Unsloth x Qwen2
51
+ [Unsloth](https://github.com/unslothai/unsloth) can speed up training LLM and reduce memory usage, but currently it only supports Llama3, Mistral, Gemma, ORPR, Phi-3 and TinyLlama.
52
+ We can't train Qwen2 with Unsloth, even though Qwen2 is popular in community.
53
+
54
+ It's exciting that we succeed to make Unsloth support Qwen2, it can speed up training and reduce much memory usage.
55
+ If you want to train Qwen2 with Unsloth, you can use [our repo](https://github.com/yangjianxin1/unsloth) rather than the official one. And we will commit our code to the [official repo](https://github.com/unslothai/unsloth).
56
+
57
+ Install our Unsloth:
58
+ ```bash
59
+ pip install git+https://github.com/yangjianxin1/unsloth.git
60
+ ```
61
+
62
+ [Firefly](https://github.com/yangjianxin1/Firefly) already supports training Qwen2 with Unsloth, and the subsequent models are trained with Firefly, you can try it.
63
+
64
+
65
+ ## Model Card for Firefly-Qwen1.5-Unsloth
66
+ [firefly-qwen1.5-en-7b-unsloth](https://huggingface.co/YeungNLP/firefly-qwen1.5-en-7b-unsloth) and [firefly-qwen1.5-en-7b-dpo-v0.1-unloth](https://huggingface.co/YeungNLP/firefly-qwen1.5-en-7b-dpo-v0.1-unsloth) are trained based on [Qwen1.5-7B](https://huggingface.co/Qwen/Qwen1.5-7B) to act as a helpful and harmless AI assistant.
67
+ We use [Firefly](https://github.com/yangjianxin1/Firefly) to train our models on **a single V100 GPU** with QLoRA and [Unsloth](https://github.com/yangjianxin1/unsloth).
68
+ firefly-qwen1.5-en-7b-unsloth is fine-tuned based on Qwen1.5-7B with English instruction data, and firefly-qwen1.5-en-7b-dpo-v0.1-unsloth is trained with [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) based on firefly-qwen1.5-en-7b-unsloth.
69
+
70
+ Our models outperform official [Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat), [Gemma-7B-it](https://huggingface.co/google/gemma-7b-it), [Zephyr-7B-Beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) on [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
71
+
72
+ Although our models are trained with English data, you can also try to chat with models in Chinese because Qwen1.5 is also good at Chinese. But we have not evaluated
73
+ the performance in Chinese yet.
74
+
75
+ We advise you to install transformers>=4.37.0.
76
+
77
+ ## Performance
78
+ We have evaluated the training gain of Qwen1.5-7B, we use QLoRA and Unsloth to train model for 20 steps on a single V100. The result can be listed as follows.
79
+ **Unsloth can reduce GPU memory by 39.13% and training time by 32.12%, and the training speed can increase by 47.32%.**
80
+
81
+ | max_seq_length | per_device_train_batch_size | gradient_accumulation_steps | use_unsloth | rank | GPU | Time |
82
+ |----------------|----------------------------|-----------------------------|-------------|------|-------------------------|-------------------|
83
+ | 1024 | 1 | 16 | false | 8 | 13.72GB | 448s |
84
+ | 1024 | 1 | 16 | true | 8 | **8.43GB**(**-38.56%**) | 308s(**-31.25%**) |
85
+ | 1024 | 1 | 16 | false | 64 | 16.01GB | 452s |
86
+ | 1024 | 1 | 16 | true | 64 | 11.07GB(**-30.86%**) | 311s(**-31.19%**) |
87
+ | 2048 | 1 | 16 | false | 64 | 18.55GB | 840s |
88
+ | 2048 | 1 | 16 | true | 64 | 12.99GB(**-29.97%**) | 596s(**-29.05%**) |
89
+ | 1024 | 4 | 4 | false | 64 | 24.70GB | 357s |
90
+ | 1024 | 4 | 4 | true | 64 | 14.36GB(**-41.86%**) | 253s(**-29.13%**) |
91
+ | 2048 | 4 | 4 | false | 64 | 32.51GB | 741s |
92
+ | 2048 | 4 | 4 | true | 64 | 19.79GB(**-39.13%**) | 503s(**-32.12%**) |
93
+
94
+
95
+ We evaluate our sft and dpo models on [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), they achieve good performance.
96
+
97
+ | Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
98
+ |--------------------------------------------|---------|--------|-----------|-------|------------|------------|--------|
99
+ | firefly-gemma-7b | 62.93 | 62.12 | 79.77 | 61.57 | 49.41 | 75.45 | 49.28 |
100
+ | **firefly-qwen1.5-en-7b-dpo-v0.1-unsloth** | 62.65 | 56.14 | 75.5 | 60.87 | 58.09 | 70.72 | 54.59 |
101
+ | zephyr-7b-beta | 61.95 | 62.03 | 84.36 | 61.07 | 57.45 | 77.74 | 29.04 |
102
+ | **firefly-qwen1.5-en-7b-unsloth** | 61.81 | 54.27 | 76.22 | 61.55 | 50.62 | 70.48 | 57.7 |
103
+ | vicuna-13b-v1.5 | 55.41 | 57.08 | 81.24 | 56.67 | 51.51 | 74.66 | 11.3 |
104
+ | Xwin-LM-13B-V0.1 | 55.29 | 62.54 | 82.8 | 56.53 | 45.96 | 74.27 | 9.63 |
105
+ | Qwen1.5-7B-Chat | 55.15 | 55.89 | 78.56 | 61.65 | 53.54 | 67.72 | 13.57 |
106
+ | gemma-7b-it | 53.56 | 51.45 | 71.96 | 53.52 | 47.29 | 67.96 | 29.19 |
107
+
108
+
109
+
110
+ ## Usage
111
+ The chat templates of our chat models are the same as Official Qwen1.5-7B-Chat:
112
+ ```text
113
+ <|im_start|>system
114
+ You are a helpful assistant.<|im_end|>
115
+ <|im_start|>user
116
+ hello, who are you?<|im_end|>
117
+ <|im_start|>assistant
118
+ I am a AI program developed by Firefly<|im_end|>
119
+ ```
120
+
121
+ You can use script to inference in [Firefly](https://github.com/yangjianxin1/Firefly/blob/master/script/chat/chat.py).
122
+
123
+ You can also use the following code:
124
+ ```python
125
+ from transformers import AutoModelForCausalLM, AutoTokenizer
126
+ import torch
127
+
128
+ model_name_or_path = "YeungNLP/firefly-qwen1.5-en-7b-unsloth"
129
+ model = AutoModelForCausalLM.from_pretrained(
130
+ model_name_or_path,
131
+ trust_remote_code=True,
132
+ low_cpu_mem_usage=True,
133
+ torch_dtype=torch.float16,
134
+ device_map='auto',
135
+ )
136
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
137
+
138
+ prompt = "Compose an engaging travel blog post about a recent trip to Hawaii, highlighting cultural experiences and must-see attractions. "
139
+ messages = [
140
+ {"role": "system", "content": "You are a helpful assistant."},
141
+ {"role": "user", "content": prompt}
142
+ ]
143
+ text = tokenizer.apply_chat_template(
144
+ messages,
145
+ tokenize=False,
146
+ add_generation_prompt=True
147
+ )
148
+ model_inputs = tokenizer([text], return_tensors="pt").to('cuda')
149
+
150
+ generated_ids = model.generate(
151
+ model_inputs.input_ids,
152
+ max_new_tokens=1500,
153
+ top_p = 0.9,
154
+ temperature = 0.35,
155
+ repetition_penalty = 1.0,
156
+ eos_token_id=tokenizer.encode('<|im_end|>', add_special_tokens=False)
157
+ )
158
+ generated_ids = [
159
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
160
+ ]
161
+
162
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
163
+ print(response)
164
+ ```
165
+
166
+ ## Training Details
167
+ Both in SFT and DPO stages, **We only use a single V100 GPU** with QLoRA and Unsloth, and we use [Firefly](https://github.com/yangjianxin1/Firefly) to train our models.
168
+
169
+ ### Training Setting
170
+ The following hyperparameters are used during SFT:
171
+ - num_epochs: 1
172
+ - learning_rate: 2e-4
173
+ - total_train_batch_size: 32
174
+ - max_seq_length: 2048
175
+ - optimizer: paged_adamw_32bit
176
+ - lr_scheduler_type: constant_with_warmup
177
+ - warmup_steps: 600
178
+ - lora_rank: 64
179
+ - lora_alpha: 16
180
+ - lora_dropout: 0.05
181
+ - gradient_checkpointing: true
182
+ - fp16: true
183
+
184
+ The following hyperparameters were used during DPO:
185
+ - num_epochs: 1
186
+ - learning_rate: 2e-4
187
+ - total_train_batch_size: 32
188
+ - max_seq_length: 2048
189
+ - max_prompt_length: 500
190
+ - optimizer: paged_adamw_32bit
191
+ - lr_scheduler_type: constant_with_warmup
192
+ - warmup_steps: 100
193
+ - lora_rank: 64
194
+ - lora_alpha: 16
195
+ - lora_dropout: 0.05
196
+ - gradient_checkpointing: true
197
+ - fp16: true
198
+
199
+
200
+ ### Training metrics
201
+
202
+ The table below shows the full set of DPO training metrics:
203
+
204
+ | Epoch | Step | Loss | Rewards/accuracies | Rewards/margins | Rewards/chosen | Rewards/rejected | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected |
205
+ |-------|------|--------|--------------------|-----------------|----------------|------------------|---------------|-----------------|--------------|----------------|
206
+ | 0.05 | 100 | 0.6128 | 0.6572 | 0.3914 | -0.0622 | -0.4537 | 1.107 | 1.1104 | -283.7632 | -264.5925 |
207
+ | 0.1 | 200 | 0.6066 | 0.6913 | 0.662 | -0.3589 | -1.0209 | 0.9433 | 0.9431 | -279.0002 | -268.6432 |
208
+ | 0.16 | 300 | 0.5803 | 0.7069 | 0.876 | -0.3849 | -1.2609 | 0.8411 | 0.8537 | -289.9482 | -274.3425 |
209
+ | 0.21 | 400 | 0.5624 | 0.7169 | 0.9575 | -0.2447 | -1.2022 | 0.7615 | 0.7497 | -293.8072 | -274.4167 |
210
+ | 0.26 | 500 | 0.5863 | 0.7 | 0.8908 | -0.5283 | -1.4191 | 0.537 | 0.5085 | -284.3388 | -267.9294 |
211
+ | 0.31 | 600 | 0.5612 | 0.7166 | 1.0791 | -0.592 | -1.6711 | 0.7121 | 0.7219 | -293.2425 | -278.5992 |
212
+ | 0.37 | 700 | 0.5741 | 0.7234 | 1.0742 | -0.8469 | -1.9211 | 0.6002 | 0.5769 | -300.8099 | -285.9137 |
213
+ | 0.42 | 800 | 0.582 | 0.7141 | 1.0414 | -1.1658 | -2.2072 | 0.7191 | 0.5934 | -300.458 | -286.1 |
214
+ | 0.47 | 900 | 0.5694 | 0.7178 | 1.2055 | -1.7372 | -2.9426 | 0.4226 | 0.316 | -305.5303 | -290.7548 |
215
+ | 0.52 | 1000 | 0.5827 | 0.7134 | 1.1063 | -1.354 | -2.4603 | 0.535 | 0.4022 | -302.7598 | -286.636 |
216
+ | 0.58 | 1100 | 0.5553 | 0.7306 | 1.3631 | -1.5861 | -2.9492 | 0.7636 | 0.6559 | -312.9375 | -290.3474 |
217
+ | 0.63 | 1200 | 0.5633 | 0.7341 | 1.2689 | -1.7187 | -2.9876 | 0.6555 | 0.5894 | -315.0179 | -298.2406 |
218
+ | 0.68 | 1300 | 0.5705 | 0.7284 | 1.3501 | -1.7762 | -3.1263 | 0.7419 | 0.6874 | -310.9056 | -294.2934 |
219
+ | 0.73 | 1400 | 0.5458 | 0.7347 | 1.4555 | -2.2377 | -3.6932 | 0.7279 | 0.6564 | -309.141 | -299.1613 |
220
+ | 0.79 | 1500 | 0.5797 | 0.7222 | 1.2937 | -2.4483 | -3.742 | 0.8444 | 0.771 | -321.578 | -298.111 |
221
+ | 0.84 | 1600 | 0.5572 | 0.7319 | 1.4824 | -2.9344 | -4.4168 | 0.9202 | 0.8605 | -323.4034 | -307.0114 |
222
+ | 0.89 | 1700 | 0.5518 | 0.7281 | 1.4263 | -2.7301 | -4.1564 | 0.9257 | 0.8785 | -313.694 | -298.1267 |
223
+ | 0.94 | 1800 | 0.5572 | 0.7272 | 1.5121 | -2.9505 | -4.4627 | 0.7899 | 0.7503 | -314.1552 | -305.9873 |
224
+ | 0.99 | 1900 | 0.5763 | 0.7241 | 1.4982 | -2.7064 | -4.2047 | 0.7841 | 0.7023 | -310.6677 | -299.5064 |
225
+