RichardErkhov commited on
Commit
e01db40
1 Parent(s): 838d90c

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +483 -0
README.md ADDED
@@ -0,0 +1,483 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ zephyr-7b-beta-128k - GGUF
11
+ - Model creator: https://huggingface.co/CallComply/
12
+ - Original model: https://huggingface.co/CallComply/zephyr-7b-beta-128k/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [zephyr-7b-beta-128k.Q2_K.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.Q2_K.gguf) | Q2_K | 2.53GB |
18
+ | [zephyr-7b-beta-128k.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.IQ3_XS.gguf) | IQ3_XS | 2.81GB |
19
+ | [zephyr-7b-beta-128k.IQ3_S.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.IQ3_S.gguf) | IQ3_S | 2.96GB |
20
+ | [zephyr-7b-beta-128k.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.Q3_K_S.gguf) | Q3_K_S | 1.8GB |
21
+ | [zephyr-7b-beta-128k.IQ3_M.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.IQ3_M.gguf) | IQ3_M | 0.98GB |
22
+ | [zephyr-7b-beta-128k.Q3_K.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.Q3_K.gguf) | Q3_K | 3.28GB |
23
+ | [zephyr-7b-beta-128k.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.Q3_K_M.gguf) | Q3_K_M | 3.28GB |
24
+ | [zephyr-7b-beta-128k.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.Q3_K_L.gguf) | Q3_K_L | 3.56GB |
25
+ | [zephyr-7b-beta-128k.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.IQ4_XS.gguf) | IQ4_XS | 3.67GB |
26
+ | [zephyr-7b-beta-128k.Q4_0.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.Q4_0.gguf) | Q4_0 | 3.83GB |
27
+ | [zephyr-7b-beta-128k.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.IQ4_NL.gguf) | IQ4_NL | 3.87GB |
28
+ | [zephyr-7b-beta-128k.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.Q4_K_S.gguf) | Q4_K_S | 3.86GB |
29
+ | [zephyr-7b-beta-128k.Q4_K.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.Q4_K.gguf) | Q4_K | 4.07GB |
30
+ | [zephyr-7b-beta-128k.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.Q4_K_M.gguf) | Q4_K_M | 4.07GB |
31
+ | [zephyr-7b-beta-128k.Q4_1.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.Q4_1.gguf) | Q4_1 | 4.24GB |
32
+ | [zephyr-7b-beta-128k.Q5_0.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.Q5_0.gguf) | Q5_0 | 4.65GB |
33
+ | [zephyr-7b-beta-128k.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.Q5_K_S.gguf) | Q5_K_S | 4.65GB |
34
+ | [zephyr-7b-beta-128k.Q5_K.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.Q5_K.gguf) | Q5_K | 4.78GB |
35
+ | [zephyr-7b-beta-128k.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.Q5_K_M.gguf) | Q5_K_M | 4.78GB |
36
+ | [zephyr-7b-beta-128k.Q5_1.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.Q5_1.gguf) | Q5_1 | 5.07GB |
37
+ | [zephyr-7b-beta-128k.Q6_K.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.Q6_K.gguf) | Q6_K | 5.53GB |
38
+ | [zephyr-7b-beta-128k.Q8_0.gguf](https://huggingface.co/RichardErkhov/CallComply_-_zephyr-7b-beta-128k-gguf/blob/main/zephyr-7b-beta-128k.Q8_0.gguf) | Q8_0 | 7.17GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ language:
46
+ - en
47
+ license: mit
48
+ tags:
49
+ - generated_from_trainer
50
+ datasets:
51
+ - HuggingFaceH4/ultrachat_200k
52
+ - HuggingFaceH4/ultrafeedback_binarized
53
+ base_model: mistralai/Mistral-7B-v0.1
54
+ widget:
55
+ - text: '<|system|>
56
+
57
+ You are a pirate chatbot who always responds with Arr!</s>
58
+
59
+ <|user|>
60
+
61
+ There''s a llama on my lawn, how can I get rid of him?</s>
62
+
63
+ <|assistant|>
64
+
65
+ '
66
+ output:
67
+ text: Arr! 'Tis a puzzlin' matter, me hearty! A llama on yer lawn be a rare sight,
68
+ but I've got a plan that might help ye get rid of 'im. Ye'll need to gather
69
+ some carrots and hay, and then lure the llama away with the promise of a tasty
70
+ treat. Once he's gone, ye can clean up yer lawn and enjoy the peace and quiet
71
+ once again. But beware, me hearty, for there may be more llamas where that one
72
+ came from! Arr!
73
+ pipeline_tag: text-generation
74
+ model-index:
75
+ - name: zephyr-7b-beta
76
+ results:
77
+ - task:
78
+ type: text-generation
79
+ name: Text Generation
80
+ dataset:
81
+ name: AI2 Reasoning Challenge (25-Shot)
82
+ type: ai2_arc
83
+ config: ARC-Challenge
84
+ split: test
85
+ args:
86
+ num_few_shot: 25
87
+ metrics:
88
+ - type: acc_norm
89
+ value: 62.03071672354948
90
+ name: normalized accuracy
91
+ - type: acc_norm
92
+ value: 58.28
93
+ name: normalized accuracy
94
+ source:
95
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
96
+ name: Open LLM Leaderboard
97
+ - task:
98
+ type: text-generation
99
+ name: Text Generation
100
+ dataset:
101
+ name: HellaSwag (10-Shot)
102
+ type: hellaswag
103
+ split: validation
104
+ args:
105
+ num_few_shot: 10
106
+ metrics:
107
+ - type: acc_norm
108
+ value: 84.35570603465445
109
+ name: normalized accuracy
110
+ - type: acc_norm
111
+ value: 81.0
112
+ name: normalized accuracy
113
+ source:
114
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
115
+ name: Open LLM Leaderboard
116
+ - task:
117
+ type: text-generation
118
+ name: Text Generation
119
+ dataset:
120
+ name: Drop (3-Shot)
121
+ type: drop
122
+ split: validation
123
+ args:
124
+ num_few_shot: 3
125
+ metrics:
126
+ - type: f1
127
+ value: 9.66243708053691
128
+ name: f1 score
129
+ source:
130
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
131
+ name: Open LLM Leaderboard
132
+ - task:
133
+ type: text-generation
134
+ name: Text Generation
135
+ dataset:
136
+ name: TruthfulQA (0-shot)
137
+ type: truthful_qa
138
+ config: multiple_choice
139
+ split: validation
140
+ args:
141
+ num_few_shot: 0
142
+ metrics:
143
+ - type: mc2
144
+ value: 57.44916942762855
145
+ - type: mc2
146
+ value: 46.1
147
+ source:
148
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
149
+ name: Open LLM Leaderboard
150
+ - task:
151
+ type: text-generation
152
+ name: Text Generation
153
+ dataset:
154
+ name: GSM8k (5-shot)
155
+ type: gsm8k
156
+ config: main
157
+ split: test
158
+ args:
159
+ num_few_shot: 5
160
+ metrics:
161
+ - type: acc
162
+ value: 12.736921910538287
163
+ name: accuracy
164
+ - type: acc
165
+ value: 13.04
166
+ name: accuracy
167
+ source:
168
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
169
+ name: Open LLM Leaderboard
170
+ - task:
171
+ type: text-generation
172
+ name: Text Generation
173
+ dataset:
174
+ name: MMLU (5-Shot)
175
+ type: cais/mmlu
176
+ config: all
177
+ split: test
178
+ args:
179
+ num_few_shot: 5
180
+ metrics:
181
+ - type: acc
182
+ value: 61.07
183
+ name: accuracy
184
+ - type: acc
185
+ value: 53.57
186
+ name: accuracy
187
+ source:
188
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
189
+ name: Open LLM Leaderboard
190
+ - task:
191
+ type: text-generation
192
+ name: Text Generation
193
+ dataset:
194
+ name: Winogrande (5-shot)
195
+ type: winogrande
196
+ config: winogrande_xl
197
+ split: validation
198
+ args:
199
+ num_few_shot: 5
200
+ metrics:
201
+ - type: acc
202
+ value: 77.7426992896606
203
+ name: accuracy
204
+ - type: acc
205
+ value: 74.74
206
+ name: accuracy
207
+ source:
208
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
209
+ name: Open LLM Leaderboard
210
+ - task:
211
+ type: text-generation
212
+ name: Text Generation
213
+ dataset:
214
+ name: AlpacaEval
215
+ type: tatsu-lab/alpaca_eval
216
+ metrics:
217
+ - type: unknown
218
+ value: 0.906
219
+ name: win rate
220
+ source:
221
+ url: https://tatsu-lab.github.io/alpaca_eval/
222
+ - task:
223
+ type: text-generation
224
+ name: Text Generation
225
+ dataset:
226
+ name: MT-Bench
227
+ type: unknown
228
+ metrics:
229
+ - type: unknown
230
+ value: 7.34
231
+ name: score
232
+ source:
233
+ url: https://huggingface.co/spaces/lmsys/mt-bench
234
+ ---
235
+
236
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
237
+ should probably proofread and complete it, then remove this comment. -->
238
+
239
+ <img src="https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha/resolve/main/thumbnail.png" alt="Zephyr Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
240
+
241
+
242
+ # Model Card for Zephyr 7B β
243
+
244
+ Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-β is the second model in the series, and is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) that was trained on on a mix of publicly available, synthetic datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). We found that removing the in-built alignment of these datasets boosted performance on [MT Bench](https://huggingface.co/spaces/lmsys/mt-bench) and made the model more helpful. However, this means that model is likely to generate problematic text when prompted to do so. You can find more details in the [technical report](https://arxiv.org/abs/2310.16944).
245
+
246
+
247
+ ## Model description
248
+
249
+ - **Model type:** A 7B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
250
+ - **Language(s) (NLP):** Primarily English
251
+ - **License:** MIT
252
+ - **Finetuned from model:** [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
253
+
254
+ ### Model Sources
255
+
256
+ <!-- Provide the basic links for the model. -->
257
+
258
+ - **Repository:** https://github.com/huggingface/alignment-handbook
259
+ - **Demo:** https://huggingface.co/spaces/HuggingFaceH4/zephyr-chat
260
+ - **Chatbot Arena:** Evaluate Zephyr 7B against 10+ LLMs in the LMSYS arena: http://arena.lmsys.org
261
+
262
+ ## Performance
263
+
264
+ At the time of release, Zephyr-7B-β is the highest ranked 7B chat model on the [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks:
265
+
266
+ | Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
267
+ |-------------|-----|----|---------------|--------------|
268
+ | StableLM-Tuned-α | 7B| dSFT |2.75| -|
269
+ | MPT-Chat | 7B |dSFT |5.42| -|
270
+ | Xwin-LMv0.1 | 7B| dPPO| 6.19| 87.83|
271
+ | Mistral-Instructv0.1 | 7B| - | 6.84 |-|
272
+ | Zephyr-7b-α |7B| dDPO| 6.88| -|
273
+ | **Zephyr-7b-β** 🪁 | **7B** | **dDPO** | **7.34** | **90.60** |
274
+ | Falcon-Instruct | 40B |dSFT |5.17 |45.71|
275
+ | Guanaco | 65B | SFT |6.41| 71.80|
276
+ | Llama2-Chat | 70B |RLHF |6.86| 92.66|
277
+ | Vicuna v1.3 | 33B |dSFT |7.12 |88.99|
278
+ | WizardLM v1.0 | 70B |dSFT |7.71 |-|
279
+ | Xwin-LM v0.1 | 70B |dPPO |- |95.57|
280
+ | GPT-3.5-turbo | - |RLHF |7.94 |89.37|
281
+ | Claude 2 | - |RLHF |8.06| 91.36|
282
+ | GPT-4 | -| RLHF |8.99| 95.28|
283
+
284
+ In particular, on several categories of MT-Bench, Zephyr-7B-β has strong performance compared to larger open models like Llama2-Chat-70B:
285
+
286
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6200d0a443eb0913fa2df7cc/raxvt5ma16d7T23my34WC.png)
287
+
288
+ However, on more complex tasks like coding and mathematics, Zephyr-7B-β lags behind proprietary models and more research is needed to close the gap.
289
+
290
+
291
+ ## Intended uses & limitations
292
+
293
+ The model was initially fine-tuned on a filtered and preprocessed of the [`UltraChat`](https://huggingface.co/datasets/stingning/ultrachat) dataset, which contains a diverse range of synthetic dialogues generated by ChatGPT.
294
+ We then further aligned the model with [🤗 TRL's](https://github.com/huggingface/trl) `DPOTrainer` on the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, which contains 64k prompts and model completions that are ranked by GPT-4. As a result, the model can be used for chat and you can check out our [demo](https://huggingface.co/spaces/HuggingFaceH4/zephyr-chat) to test its capabilities.
295
+
296
+ You can find the datasets used for training Zephyr-7B-β [here](https://huggingface.co/collections/HuggingFaceH4/zephyr-7b-6538c6d6d5ddd1cbb1744a66)
297
+
298
+ Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
299
+
300
+ ```python
301
+ # Install transformers from source - only needed for versions <= v4.34
302
+ # pip install git+https://github.com/huggingface/transformers.git
303
+ # pip install accelerate
304
+
305
+ import torch
306
+ from transformers import pipeline
307
+
308
+ pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto")
309
+
310
+ # We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
311
+ messages = [
312
+ {
313
+ "role": "system",
314
+ "content": "You are a friendly chatbot who always responds in the style of a pirate",
315
+ },
316
+ {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
317
+ ]
318
+ prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
319
+ outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
320
+ print(outputs[0]["generated_text"])
321
+ # <|system|>
322
+ # You are a friendly chatbot who always responds in the style of a pirate.</s>
323
+ # <|user|>
324
+ # How many helicopters can a human eat in one sitting?</s>
325
+ # <|assistant|>
326
+ # Ah, me hearty matey! But yer question be a puzzler! A human cannot eat a helicopter in one sitting, as helicopters are not edible. They be made of metal, plastic, and other materials, not food!
327
+ ```
328
+
329
+ ## Bias, Risks, and Limitations
330
+
331
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
332
+
333
+ Zephyr-7B-β has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
334
+ It is also unknown what the size and composition of the corpus was used to train the base model (`mistralai/Mistral-7B-v0.1`), however it is likely to have included a mix of Web data and technical sources like books and code. See the [Falcon 180B model card](https://huggingface.co/tiiuae/falcon-180B#training-data) for an example of this.
335
+
336
+
337
+ ## Training and evaluation data
338
+
339
+ During DPO training, this model achieves the following results on the evaluation set:
340
+
341
+ - Loss: 0.7496
342
+ - Rewards/chosen: -4.5221
343
+ - Rewards/rejected: -8.3184
344
+ - Rewards/accuracies: 0.7812
345
+ - Rewards/margins: 3.7963
346
+ - Logps/rejected: -340.1541
347
+ - Logps/chosen: -299.4561
348
+ - Logits/rejected: -2.3081
349
+ - Logits/chosen: -2.3531
350
+
351
+
352
+ ### Training hyperparameters
353
+
354
+ The following hyperparameters were used during training:
355
+ - learning_rate: 5e-07
356
+ - train_batch_size: 2
357
+ - eval_batch_size: 4
358
+ - seed: 42
359
+ - distributed_type: multi-GPU
360
+ - num_devices: 16
361
+ - total_train_batch_size: 32
362
+ - total_eval_batch_size: 64
363
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
364
+ - lr_scheduler_type: linear
365
+ - lr_scheduler_warmup_ratio: 0.1
366
+ - num_epochs: 3.0
367
+
368
+ ### Training results
369
+
370
+ The table below shows the full set of DPO training metrics:
371
+
372
+
373
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
374
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
375
+ | 0.6284 | 0.05 | 100 | 0.6098 | 0.0425 | -0.1872 | 0.7344 | 0.2297 | -258.8416 | -253.8099 | -2.7976 | -2.8234 |
376
+ | 0.4908 | 0.1 | 200 | 0.5426 | -0.0279 | -0.6842 | 0.75 | 0.6563 | -263.8124 | -254.5145 | -2.7719 | -2.7960 |
377
+ | 0.5264 | 0.15 | 300 | 0.5324 | 0.0414 | -0.9793 | 0.7656 | 1.0207 | -266.7627 | -253.8209 | -2.7892 | -2.8122 |
378
+ | 0.5536 | 0.21 | 400 | 0.4957 | -0.0185 | -1.5276 | 0.7969 | 1.5091 | -272.2460 | -254.4203 | -2.8542 | -2.8764 |
379
+ | 0.5362 | 0.26 | 500 | 0.5031 | -0.2630 | -1.5917 | 0.7812 | 1.3287 | -272.8869 | -256.8653 | -2.8702 | -2.8958 |
380
+ | 0.5966 | 0.31 | 600 | 0.5963 | -0.2993 | -1.6491 | 0.7812 | 1.3499 | -273.4614 | -257.2279 | -2.8778 | -2.8986 |
381
+ | 0.5014 | 0.36 | 700 | 0.5382 | -0.2859 | -1.4750 | 0.75 | 1.1891 | -271.7204 | -257.0942 | -2.7659 | -2.7869 |
382
+ | 0.5334 | 0.41 | 800 | 0.5677 | -0.4289 | -1.8968 | 0.7969 | 1.4679 | -275.9378 | -258.5242 | -2.7053 | -2.7265 |
383
+ | 0.5251 | 0.46 | 900 | 0.5772 | -0.2116 | -1.3107 | 0.7344 | 1.0991 | -270.0768 | -256.3507 | -2.8463 | -2.8662 |
384
+ | 0.5205 | 0.52 | 1000 | 0.5262 | -0.3792 | -1.8585 | 0.7188 | 1.4793 | -275.5552 | -258.0276 | -2.7893 | -2.7979 |
385
+ | 0.5094 | 0.57 | 1100 | 0.5433 | -0.6279 | -1.9368 | 0.7969 | 1.3089 | -276.3377 | -260.5136 | -2.7453 | -2.7536 |
386
+ | 0.5837 | 0.62 | 1200 | 0.5349 | -0.3780 | -1.9584 | 0.7656 | 1.5804 | -276.5542 | -258.0154 | -2.7643 | -2.7756 |
387
+ | 0.5214 | 0.67 | 1300 | 0.5732 | -1.0055 | -2.2306 | 0.7656 | 1.2251 | -279.2761 | -264.2903 | -2.6986 | -2.7113 |
388
+ | 0.6914 | 0.72 | 1400 | 0.5137 | -0.6912 | -2.1775 | 0.7969 | 1.4863 | -278.7448 | -261.1467 | -2.7166 | -2.7275 |
389
+ | 0.4655 | 0.77 | 1500 | 0.5090 | -0.7987 | -2.2930 | 0.7031 | 1.4943 | -279.8999 | -262.2220 | -2.6651 | -2.6838 |
390
+ | 0.5731 | 0.83 | 1600 | 0.5312 | -0.8253 | -2.3520 | 0.7812 | 1.5268 | -280.4902 | -262.4876 | -2.6543 | -2.6728 |
391
+ | 0.5233 | 0.88 | 1700 | 0.5206 | -0.4573 | -2.0951 | 0.7812 | 1.6377 | -277.9205 | -258.8084 | -2.6870 | -2.7097 |
392
+ | 0.5593 | 0.93 | 1800 | 0.5231 | -0.5508 | -2.2000 | 0.7969 | 1.6492 | -278.9703 | -259.7433 | -2.6221 | -2.6519 |
393
+ | 0.4967 | 0.98 | 1900 | 0.5290 | -0.5340 | -1.9570 | 0.8281 | 1.4230 | -276.5395 | -259.5749 | -2.6564 | -2.6878 |
394
+ | 0.0921 | 1.03 | 2000 | 0.5368 | -1.1376 | -3.1615 | 0.7812 | 2.0239 | -288.5854 | -265.6111 | -2.6040 | -2.6345 |
395
+ | 0.0733 | 1.08 | 2100 | 0.5453 | -1.1045 | -3.4451 | 0.7656 | 2.3406 | -291.4208 | -265.2799 | -2.6289 | -2.6595 |
396
+ | 0.0972 | 1.14 | 2200 | 0.5571 | -1.6915 | -3.9823 | 0.8125 | 2.2908 | -296.7934 | -271.1505 | -2.6471 | -2.6709 |
397
+ | 0.1058 | 1.19 | 2300 | 0.5789 | -1.0621 | -3.8941 | 0.7969 | 2.8319 | -295.9106 | -264.8563 | -2.5527 | -2.5798 |
398
+ | 0.2423 | 1.24 | 2400 | 0.5455 | -1.1963 | -3.5590 | 0.7812 | 2.3627 | -292.5599 | -266.1981 | -2.5414 | -2.5784 |
399
+ | 0.1177 | 1.29 | 2500 | 0.5889 | -1.8141 | -4.3942 | 0.7969 | 2.5801 | -300.9120 | -272.3761 | -2.4802 | -2.5189 |
400
+ | 0.1213 | 1.34 | 2600 | 0.5683 | -1.4608 | -3.8420 | 0.8125 | 2.3812 | -295.3901 | -268.8436 | -2.4774 | -2.5207 |
401
+ | 0.0889 | 1.39 | 2700 | 0.5890 | -1.6007 | -3.7337 | 0.7812 | 2.1330 | -294.3068 | -270.2423 | -2.4123 | -2.4522 |
402
+ | 0.0995 | 1.45 | 2800 | 0.6073 | -1.5519 | -3.8362 | 0.8281 | 2.2843 | -295.3315 | -269.7538 | -2.4685 | -2.5050 |
403
+ | 0.1145 | 1.5 | 2900 | 0.5790 | -1.7939 | -4.2876 | 0.8438 | 2.4937 | -299.8461 | -272.1744 | -2.4272 | -2.4674 |
404
+ | 0.0644 | 1.55 | 3000 | 0.5735 | -1.7285 | -4.2051 | 0.8125 | 2.4766 | -299.0209 | -271.5201 | -2.4193 | -2.4574 |
405
+ | 0.0798 | 1.6 | 3100 | 0.5537 | -1.7226 | -4.2850 | 0.8438 | 2.5624 | -299.8200 | -271.4610 | -2.5367 | -2.5696 |
406
+ | 0.1013 | 1.65 | 3200 | 0.5575 | -1.5715 | -3.9813 | 0.875 | 2.4098 | -296.7825 | -269.9498 | -2.4926 | -2.5267 |
407
+ | 0.1254 | 1.7 | 3300 | 0.5905 | -1.6412 | -4.4703 | 0.8594 | 2.8291 | -301.6730 | -270.6473 | -2.5017 | -2.5340 |
408
+ | 0.085 | 1.76 | 3400 | 0.6133 | -1.9159 | -4.6760 | 0.8438 | 2.7601 | -303.7296 | -273.3941 | -2.4614 | -2.4960 |
409
+ | 0.065 | 1.81 | 3500 | 0.6074 | -1.8237 | -4.3525 | 0.8594 | 2.5288 | -300.4951 | -272.4724 | -2.4597 | -2.5004 |
410
+ | 0.0755 | 1.86 | 3600 | 0.5836 | -1.9252 | -4.4005 | 0.8125 | 2.4753 | -300.9748 | -273.4872 | -2.4327 | -2.4716 |
411
+ | 0.0746 | 1.91 | 3700 | 0.5789 | -1.9280 | -4.4906 | 0.8125 | 2.5626 | -301.8762 | -273.5149 | -2.4686 | -2.5115 |
412
+ | 0.1348 | 1.96 | 3800 | 0.6015 | -1.8658 | -4.2428 | 0.8281 | 2.3769 | -299.3976 | -272.8936 | -2.4943 | -2.5393 |
413
+ | 0.0217 | 2.01 | 3900 | 0.6122 | -2.3335 | -4.9229 | 0.8281 | 2.5894 | -306.1988 | -277.5699 | -2.4841 | -2.5272 |
414
+ | 0.0219 | 2.07 | 4000 | 0.6522 | -2.9890 | -6.0164 | 0.8281 | 3.0274 | -317.1334 | -284.1248 | -2.4105 | -2.4545 |
415
+ | 0.0119 | 2.12 | 4100 | 0.6922 | -3.4777 | -6.6749 | 0.7969 | 3.1972 | -323.7187 | -289.0121 | -2.4272 | -2.4699 |
416
+ | 0.0153 | 2.17 | 4200 | 0.6993 | -3.2406 | -6.6775 | 0.7969 | 3.4369 | -323.7453 | -286.6413 | -2.4047 | -2.4465 |
417
+ | 0.011 | 2.22 | 4300 | 0.7178 | -3.7991 | -7.4397 | 0.7656 | 3.6406 | -331.3667 | -292.2260 | -2.3843 | -2.4290 |
418
+ | 0.0072 | 2.27 | 4400 | 0.6840 | -3.3269 | -6.8021 | 0.8125 | 3.4752 | -324.9908 | -287.5042 | -2.4095 | -2.4536 |
419
+ | 0.0197 | 2.32 | 4500 | 0.7013 | -3.6890 | -7.3014 | 0.8125 | 3.6124 | -329.9841 | -291.1250 | -2.4118 | -2.4543 |
420
+ | 0.0182 | 2.37 | 4600 | 0.7476 | -3.8994 | -7.5366 | 0.8281 | 3.6372 | -332.3356 | -293.2291 | -2.4163 | -2.4565 |
421
+ | 0.0125 | 2.43 | 4700 | 0.7199 | -4.0560 | -7.5765 | 0.8438 | 3.5204 | -332.7345 | -294.7952 | -2.3699 | -2.4100 |
422
+ | 0.0082 | 2.48 | 4800 | 0.7048 | -3.6613 | -7.1356 | 0.875 | 3.4743 | -328.3255 | -290.8477 | -2.3925 | -2.4303 |
423
+ | 0.0118 | 2.53 | 4900 | 0.6976 | -3.7908 | -7.3152 | 0.8125 | 3.5244 | -330.1224 | -292.1431 | -2.3633 | -2.4047 |
424
+ | 0.0118 | 2.58 | 5000 | 0.7198 | -3.9049 | -7.5557 | 0.8281 | 3.6508 | -332.5271 | -293.2844 | -2.3764 | -2.4194 |
425
+ | 0.006 | 2.63 | 5100 | 0.7506 | -4.2118 | -7.9149 | 0.8125 | 3.7032 | -336.1194 | -296.3530 | -2.3407 | -2.3860 |
426
+ | 0.0143 | 2.68 | 5200 | 0.7408 | -4.2433 | -7.9802 | 0.8125 | 3.7369 | -336.7721 | -296.6682 | -2.3509 | -2.3946 |
427
+ | 0.0057 | 2.74 | 5300 | 0.7552 | -4.3392 | -8.0831 | 0.7969 | 3.7439 | -337.8013 | -297.6275 | -2.3388 | -2.3842 |
428
+ | 0.0138 | 2.79 | 5400 | 0.7404 | -4.2395 | -7.9762 | 0.8125 | 3.7367 | -336.7322 | -296.6304 | -2.3286 | -2.3737 |
429
+ | 0.0079 | 2.84 | 5500 | 0.7525 | -4.4466 | -8.2196 | 0.7812 | 3.7731 | -339.1662 | -298.7007 | -2.3200 | -2.3641 |
430
+ | 0.0077 | 2.89 | 5600 | 0.7520 | -4.5586 | -8.3485 | 0.7969 | 3.7899 | -340.4545 | -299.8206 | -2.3078 | -2.3517 |
431
+ | 0.0094 | 2.94 | 5700 | 0.7527 | -4.5542 | -8.3509 | 0.7812 | 3.7967 | -340.4790 | -299.7773 | -2.3062 | -2.3510 |
432
+ | 0.0054 | 2.99 | 5800 | 0.7520 | -4.5169 | -8.3079 | 0.7812 | 3.7911 | -340.0493 | -299.4038 | -2.3081 | -2.3530 |
433
+
434
+
435
+ ### Framework versions
436
+
437
+ - Transformers 4.35.0.dev0
438
+ - Pytorch 2.0.1+cu118
439
+ - Datasets 2.12.0
440
+ - Tokenizers 0.14.0
441
+
442
+ ## Citation
443
+
444
+ If you find Zephyr-7B-β is useful in your work, please cite it with:
445
+
446
+ ```
447
+ @misc{tunstall2023zephyr,
448
+ title={Zephyr: Direct Distillation of LM Alignment},
449
+ author={Lewis Tunstall and Edward Beeching and Nathan Lambert and Nazneen Rajani and Kashif Rasul and Younes Belkada and Shengyi Huang and Leandro von Werra and Clémentine Fourrier and Nathan Habib and Nathan Sarrazin and Omar Sanseviero and Alexander M. Rush and Thomas Wolf},
450
+ year={2023},
451
+ eprint={2310.16944},
452
+ archivePrefix={arXiv},
453
+ primaryClass={cs.LG}
454
+ }
455
+ ```
456
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
457
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta)
458
+
459
+ | Metric | Value |
460
+ |-----------------------|---------------------------|
461
+ | Avg. | 52.15 |
462
+ | ARC (25-shot) | 62.03 |
463
+ | HellaSwag (10-shot) | 84.36 |
464
+ | MMLU (5-shot) | 61.07 |
465
+ | TruthfulQA (0-shot) | 57.45 |
466
+ | Winogrande (5-shot) | 77.74 |
467
+ | GSM8K (5-shot) | 12.74 |
468
+ | DROP (3-shot) | 9.66 |
469
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
470
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_CallComply__zephyr-7b-beta-128k)
471
+
472
+ | Metric |Value|
473
+ |---------------------------------|----:|
474
+ |Avg. |54.45|
475
+ |AI2 Reasoning Challenge (25-Shot)|58.28|
476
+ |HellaSwag (10-Shot) |81.00|
477
+ |MMLU (5-Shot) |53.57|
478
+ |TruthfulQA (0-shot) |46.10|
479
+ |Winogrande (5-shot) |74.74|
480
+ |GSM8k (5-shot) |13.04|
481
+
482
+
483
+