RichardErkhov commited on
Commit
bf9eab3
1 Parent(s): 7abe461

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +207 -0
README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ firefly-qwen1.5-en-7b-dpo-v0.1 - GGUF
11
+ - Model creator: https://huggingface.co/YeungNLP/
12
+ - Original model: https://huggingface.co/YeungNLP/firefly-qwen1.5-en-7b-dpo-v0.1/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.Q2_K.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.Q2_K.gguf) | Q2_K | 2.89GB |
18
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.IQ3_XS.gguf) | IQ3_XS | 3.18GB |
19
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.IQ3_S.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.IQ3_S.gguf) | IQ3_S | 3.32GB |
20
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.Q3_K_S.gguf) | Q3_K_S | 3.32GB |
21
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.IQ3_M.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.IQ3_M.gguf) | IQ3_M | 3.48GB |
22
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.Q3_K.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.Q3_K.gguf) | Q3_K | 3.65GB |
23
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.Q3_K_M.gguf) | Q3_K_M | 3.65GB |
24
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.Q3_K_L.gguf) | Q3_K_L | 3.93GB |
25
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.IQ4_XS.gguf) | IQ4_XS | 4.02GB |
26
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.Q4_0.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.Q4_0.gguf) | Q4_0 | 4.2GB |
27
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.IQ4_NL.gguf) | IQ4_NL | 4.22GB |
28
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.Q4_K_S.gguf) | Q4_K_S | 4.23GB |
29
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.Q4_K.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.Q4_K.gguf) | Q4_K | 4.44GB |
30
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.Q4_K_M.gguf) | Q4_K_M | 4.44GB |
31
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.Q4_1.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.Q4_1.gguf) | Q4_1 | 4.62GB |
32
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.Q5_0.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.Q5_0.gguf) | Q5_0 | 5.03GB |
33
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.Q5_K_S.gguf) | Q5_K_S | 5.03GB |
34
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.Q5_K.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.Q5_K.gguf) | Q5_K | 5.15GB |
35
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.Q5_K_M.gguf) | Q5_K_M | 5.15GB |
36
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.Q5_1.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.Q5_1.gguf) | Q5_1 | 5.44GB |
37
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.Q6_K.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.Q6_K.gguf) | Q6_K | 5.91GB |
38
+ | [firefly-qwen1.5-en-7b-dpo-v0.1.Q8_0.gguf](https://huggingface.co/RichardErkhov/YeungNLP_-_firefly-qwen1.5-en-7b-dpo-v0.1-gguf/blob/main/firefly-qwen1.5-en-7b-dpo-v0.1.Q8_0.gguf) | Q8_0 | 7.65GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ library_name: transformers
46
+ license: apache-2.0
47
+ basemodel: Qwen/Qwen1.5-7B
48
+ ---
49
+
50
+ ## Model Card for Firefly-Qwen1.5
51
+
52
+ [firefly-qwen1.5-en-7b](https://huggingface.co/YeungNLP/firefly-qwen1.5-en-7b) and [firefly-qwen1.5-en-7b-dpo-v0.1](https://huggingface.co/YeungNLP/firefly-qwen1.5-en-7b-dpo-v0.1) are trained based on [Qwen1.5-7B](https://huggingface.co/Qwen/Qwen1.5-7B) to act as a helpful and harmless AI assistant.
53
+ We use [Firefly](https://github.com/yangjianxin1/Firefly) to train our models on **a single V100 GPU** with QLoRA.
54
+ firefly-qwen1.5-en-7b is fine-tuned based on Qwen1.5-7B with English instruction data, and firefly-qwen1.5-en-7b-dpo-v0.1 is trained with [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) based on firefly-qwen1.5-en-7b.
55
+
56
+ Our models outperform official [Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat), [Gemma-7B-it](https://huggingface.co/google/gemma-7b-it), [Zephyr-7B-Beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) on [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
57
+
58
+ <img src="pics/open_llm.png" width="800">
59
+
60
+ Although our models are trained with English data, you can also try to chat with models in Chinese because Qwen1.5 is also good at Chinese. But we have not evaluated
61
+ the performance in Chinese yet.
62
+
63
+ We advise you to install transformers>=4.37.0.
64
+
65
+ ## Performance
66
+ We evaluate our models on [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), they achieve good performance.
67
+
68
+ | Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
69
+ |-----------------------------------|--------|--------|-----------|--------|------------|------------|--------|
70
+ | firefly-gemma-7b | 62.93 | 62.12 | 79.77 | 61.57 | 49.41 | 75.45 | 49.28 |
71
+ | **firefly-qwen1.5-en-7b-dpo-v0.1** | 62.36 | 54.35 | 76.04 | 61.21 | 56.4 | 72.06 | 54.13 |
72
+ | zephyr-7b-beta | 61.95 | 62.03 | 84.36 | 61.07 | 57.45 | 77.74 | 29.04 |
73
+ | **firefly-qwen1.5-en-7b** | 61.44 | 53.41 | 75.51 | 61.67 |51.96 |70.72 | 55.34 |
74
+ | vicuna-13b-v1.5 | 55.41 | 57.08 | 81.24 | 56.67 | 51.51 | 74.66 | 11.3 |
75
+ | Xwin-LM-13B-V0.1 | 55.29 | 62.54 | 82.8 | 56.53 | 45.96 | 74.27 | 9.63 |
76
+ | Qwen1.5-7B-Chat | 55.15 | 55.89 | 78.56 | 61.65 | 53.54 | 67.72 | 13.57 |
77
+ | gemma-7b-it | 53.56 | 51.45 | 71.96 | 53.52 | 47.29 | 67.96 | 29.19 |
78
+
79
+
80
+
81
+ ## Usage
82
+ The chat templates of our chat models are the same as Official Qwen1.5-7B-Chat:
83
+ ```text
84
+ <|im_start|>system
85
+ You are a helpful assistant.<|im_end|>
86
+ <|im_start|>user
87
+ hello, who are you?<|im_end|>
88
+ <|im_start|>assistant
89
+ I am a AI program developed by Firefly<|im_end|>
90
+ ```
91
+
92
+ You can use script to inference in [Firefly](https://github.com/yangjianxin1/Firefly/blob/master/script/chat/chat.py).
93
+
94
+ You can also use the following code:
95
+ ```python
96
+ from transformers import AutoModelForCausalLM, AutoTokenizer
97
+ import torch
98
+
99
+ model_name_or_path = "YeungNLP/firefly-qwen1.5-en-7b-dpo-v0.1"
100
+ model = AutoModelForCausalLM.from_pretrained(
101
+ model_name_or_path,
102
+ trust_remote_code=True,
103
+ low_cpu_mem_usage=True,
104
+ torch_dtype=torch.float16,
105
+ device_map='auto',
106
+ )
107
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
108
+
109
+ prompt = "Compose an engaging travel blog post about a recent trip to Hawaii, highlighting cultural experiences and must-see attractions. "
110
+ messages = [
111
+ {"role": "system", "content": "You are a helpful assistant."},
112
+ {"role": "user", "content": prompt}
113
+ ]
114
+ text = tokenizer.apply_chat_template(
115
+ messages,
116
+ tokenize=False,
117
+ add_generation_prompt=True
118
+ )
119
+ model_inputs = tokenizer([text], return_tensors="pt").to('cuda')
120
+
121
+ generated_ids = model.generate(
122
+ model_inputs.input_ids,
123
+ max_new_tokens=1500,
124
+ top_p = 0.9,
125
+ temperature = 0.35,
126
+ repetition_penalty = 1.0,
127
+ eos_token_id=tokenizer.encode('<|im_end|>', add_special_tokens=False)
128
+ )
129
+ generated_ids = [
130
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
131
+ ]
132
+
133
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
134
+ print(response)
135
+ ```
136
+
137
+ ## Training Details
138
+ Both in SFT and DPO stages, **We only use a single V100 GPU** with QLoRA, and we use [Firefly](https://github.com/yangjianxin1/Firefly) to train our models.
139
+
140
+ ### Training Setting
141
+ The following hyperparameters are used during SFT:
142
+ - num_epochs: 1
143
+ - learning_rate: 2e-4
144
+ - total_train_batch_size: 32
145
+ - max_seq_length: 2048
146
+ - optimizer: paged_adamw_32bit
147
+ - lr_scheduler_type: constant_with_warmup
148
+ - warmup_steps: 700
149
+ - lora_rank: 64
150
+ - lora_alpha: 16
151
+ - lora_dropout: 0.05
152
+ - gradient_checkpointing: true
153
+ - fp16: true
154
+
155
+ The following hyperparameters were used during DPO:
156
+ - num_epochs: 1
157
+ - learning_rate: 2e-4
158
+ - total_train_batch_size: 32
159
+ - max_seq_length: 1600
160
+ - max_prompt_length: 500
161
+ - optimizer: paged_adamw_32bit
162
+ - lr_scheduler_type: constant_with_warmup
163
+ - warmup_steps: 200
164
+ - lora_rank: 64
165
+ - lora_alpha: 16
166
+ - lora_dropout: 0.05
167
+ - gradient_checkpointing: true
168
+ - fp16: true
169
+
170
+
171
+ ### Training metrics
172
+ Training Rewards/margins in DPO:
173
+
174
+ <img src="pics/margins.png" width="600">
175
+
176
+ Training Rewards/accuracies in DPO:
177
+
178
+ <img src="pics/accuracies.png" width="500">
179
+
180
+ Training loss in DPO:
181
+
182
+ <img src="pics/loss.png" width="500">
183
+
184
+ The table below shows the full set of DPO training metrics:
185
+
186
+ | Epoch | Step | Loss | Rewards/accuracies | Rewards/margins | Rewards/chosen | Rewards/rejected | Logits/chosen| Logits/rejected | Logps/chosen| Logps/rejected|
187
+ |---|---|---|---|---|---|---|---|---|---|---|
188
+ |0.05|100|0.6231|0.6587|0.3179|0.0404|-0.2774|1.1694|1.2377|-284.5586|-255.4863|
189
+ |0.1|200|0.5945|0.6894|0.5988|-0.1704|-0.7693|1.012|1.0283|-284.3049|-268.1887|
190
+ |0.16|300|0.5754|0.6981|0.8314|-0.282|-1.1133|0.8912|0.8956|-283.6926|-270.3117|
191
+ |0.21|400|0.5702|0.7194|0.9369|-0.1944|-1.1313|0.7255|0.7557|-291.2833|-273.9706|
192
+ |0.26|500|0.5913|0.695|0.8784|-0.4524|-1.3309|0.5491|0.5535|-289.5705|-271.754|
193
+ |0.31|600|0.5743|0.6994|1.0192|-0.4505|-1.4698|0.6446|0.6399|-296.5292|-277.824|
194
+ |0.37|700|0.5876|0.7219|1.0471|-0.6998|-1.747|0.4955|0.4329|-303.7684|-289.0117|
195
+ |0.42|800|0.5831|0.715|1.0485|-0.8185|-1.8671|0.5589|0.4804|-295.6313|-288.0656|
196
+ |0.47|900|0.5674|0.7119|1.1854|-1.2085|-2.3939|0.3467|0.2249|-302.3643|-286.2816|
197
+ |0.52|1000|0.5794|0.7138|1.1458|-0.8423|-1.9881|0.5116|0.4248|-299.3136|-287.3934|
198
+ |0.58|1100|0.5718|0.7194|1.2897|-1.4944|-2.7841|0.6392|0.5739|-316.6829|-294.1148|
199
+ |0.63|1200|0.5718|0.7275|1.2459|-1.7543|-3.0002|0.4999|0.4065|-316.7873|-297.8514|
200
+ |0.68|1300|0.5789|0.72|1.3379|-1.8485|-3.1864|0.4289|0.3172|-314.8326|-296.8319|
201
+ |0.73|1400|0.5462|0.7425|1.4074|-1.9865|-3.3939|0.3645|0.2333|-309.4503|-294.3931|
202
+ |0.79|1500|0.5829|0.7156|1.2582|-2.1183|-3.3766|0.4193|0.2796|-307.5281|-292.0817|
203
+ |0.84|1600|0.5575|0.7375|1.471|-2.1429|-3.6139|0.6547|0.5152|-310.9912|-298.899|
204
+ |0.89|1700|0.5638|0.745|1.5433|-2.991|-4.5343|0.7336|0.6782|-328.2657|-307.5182|
205
+ |0.94|1800|0.5559|0.7181|1.4484|-2.8818|-4.3302|0.7997|0.8327|-316.2716|-295.1836|
206
+ |0.99|1900|0.5627|0.7387|1.5378|-2.7941|-4.332|0.8573|0.858|-324.9405|-310.1192|
207
+