RichardErkhov commited on
Commit
042766e
1 Parent(s): e7223ec

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +241 -0
README.md ADDED
@@ -0,0 +1,241 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ Llama3-8B-SuperNova-Spectrum-Hermes-DPO - GGUF
11
+ - Model creator: https://huggingface.co/yuvraj17/
12
+ - Original model: https://huggingface.co/yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q2_K.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q2_K.gguf) | Q2_K | 2.96GB |
18
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.IQ3_XS.gguf) | IQ3_XS | 3.28GB |
19
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.IQ3_S.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.IQ3_S.gguf) | IQ3_S | 3.43GB |
20
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q3_K_S.gguf) | Q3_K_S | 3.41GB |
21
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.IQ3_M.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.IQ3_M.gguf) | IQ3_M | 3.52GB |
22
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q3_K.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q3_K.gguf) | Q3_K | 3.74GB |
23
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q3_K_M.gguf) | Q3_K_M | 3.74GB |
24
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q3_K_L.gguf) | Q3_K_L | 4.03GB |
25
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.IQ4_XS.gguf) | IQ4_XS | 4.18GB |
26
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q4_0.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q4_0.gguf) | Q4_0 | 4.34GB |
27
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.IQ4_NL.gguf) | IQ4_NL | 4.38GB |
28
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q4_K_S.gguf) | Q4_K_S | 4.37GB |
29
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q4_K.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q4_K.gguf) | Q4_K | 4.58GB |
30
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q4_K_M.gguf) | Q4_K_M | 4.58GB |
31
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q4_1.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q4_1.gguf) | Q4_1 | 4.78GB |
32
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q5_0.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q5_0.gguf) | Q5_0 | 5.21GB |
33
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q5_K_S.gguf) | Q5_K_S | 5.21GB |
34
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q5_K.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q5_K.gguf) | Q5_K | 5.34GB |
35
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q5_K_M.gguf) | Q5_K_M | 5.34GB |
36
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q5_1.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q5_1.gguf) | Q5_1 | 5.65GB |
37
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q6_K.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q6_K.gguf) | Q6_K | 6.14GB |
38
+ | [Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q8_0.gguf](https://huggingface.co/RichardErkhov/yuvraj17_-_Llama3-8B-SuperNova-Spectrum-Hermes-DPO-gguf/blob/main/Llama3-8B-SuperNova-Spectrum-Hermes-DPO.Q8_0.gguf) | Q8_0 | 7.95GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ language:
46
+ - en
47
+ license: apache-2.0
48
+ library_name: transformers
49
+ tags:
50
+ - dpo
51
+ - rlhf
52
+ - trl
53
+ pipeline_tag: text-generation
54
+ model-index:
55
+ - name: Llama3-8B-SuperNova-Spectrum-Hermes-DPO
56
+ results:
57
+ - task:
58
+ type: text-generation
59
+ name: Text Generation
60
+ dataset:
61
+ name: IFEval (0-Shot)
62
+ type: HuggingFaceH4/ifeval
63
+ args:
64
+ num_few_shot: 0
65
+ metrics:
66
+ - type: inst_level_strict_acc and prompt_level_strict_acc
67
+ value: 46.91
68
+ name: strict accuracy
69
+ source:
70
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO
71
+ name: Open LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: BBH (3-Shot)
77
+ type: BBH
78
+ args:
79
+ num_few_shot: 3
80
+ metrics:
81
+ - type: acc_norm
82
+ value: 21.24
83
+ name: normalized accuracy
84
+ source:
85
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO
86
+ name: Open LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: MATH Lvl 5 (4-Shot)
92
+ type: hendrycks/competition_math
93
+ args:
94
+ num_few_shot: 4
95
+ metrics:
96
+ - type: exact_match
97
+ value: 5.14
98
+ name: exact match
99
+ source:
100
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO
101
+ name: Open LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: GPQA (0-shot)
107
+ type: Idavidrein/gpqa
108
+ args:
109
+ num_few_shot: 0
110
+ metrics:
111
+ - type: acc_norm
112
+ value: 6.94
113
+ name: acc_norm
114
+ source:
115
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO
116
+ name: Open LLM Leaderboard
117
+ - task:
118
+ type: text-generation
119
+ name: Text Generation
120
+ dataset:
121
+ name: MuSR (0-shot)
122
+ type: TAUR-Lab/MuSR
123
+ args:
124
+ num_few_shot: 0
125
+ metrics:
126
+ - type: acc_norm
127
+ value: 9.62
128
+ name: acc_norm
129
+ source:
130
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO
131
+ name: Open LLM Leaderboard
132
+ - task:
133
+ type: text-generation
134
+ name: Text Generation
135
+ dataset:
136
+ name: MMLU-PRO (5-shot)
137
+ type: TIGER-Lab/MMLU-Pro
138
+ config: main
139
+ split: test
140
+ args:
141
+ num_few_shot: 5
142
+ metrics:
143
+ - type: acc
144
+ value: 18.16
145
+ name: accuracy
146
+ source:
147
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO
148
+ name: Open LLM Leaderboard
149
+ ---
150
+
151
+ # Llama3-8B-SuperNova-Spectrum-Hermes-DPO
152
+
153
+ This model is a **DPO fine-tuned** version of my `DARE_TIES` merged Model [`yuvraj17/Llama3-8B-SuperNova-Spectrum-dare_ties`](https://huggingface.co/yuvraj17/Llama3-8B-SuperNova-Spectrum-dare_ties) on the [yuvraj17/chatml-OpenHermes2.5-dpo-binarized-alpha-2k](https://huggingface.co/datasets/yuvraj17/chatml-OpenHermes2.5-dpo-binarized-alpha-2k) dataset.
154
+
155
+ ## DPO (Direct Preference Optimization):
156
+
157
+ Direct Preference Optimization (DPO) is a fine-tuning technique that focuses on aligning a model's responses with human preferences or ranking data without requiring reinforcement learning steps, like in RLHF.
158
+
159
+ <figure>
160
+
161
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/66137d95e8d2cda230ddcea6/kHcU5dkcSVqxEIWt_GRUB.png" width="1000" height="768">
162
+ <figcaption> DPO vs RLHF <a href="//arxiv.org/abs/2305.18290">Reference</a> </figcaption>
163
+
164
+ </figure>
165
+
166
+ ## Training:
167
+
168
+ - Trained on **1x A40s (48GB VRAM)** using the [HuggingFace TRL](https://huggingface.co/docs/trl/index).
169
+ - **QLoRA**(`4-bit precision`) for 1 epoch
170
+ ```
171
+ # LoRA configuration
172
+ peft_config = LoraConfig(
173
+ r=32,
174
+ lora_alpha=16,
175
+ lora_dropout=0.05,
176
+ bias="none",
177
+ task_type="CAUSAL_LM",
178
+ target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
179
+ )
180
+ ```
181
+ ### Training Params
182
+
183
+ The following hyperparameters were used during training:
184
+ - learning_rate: 5e-05
185
+ - beta=0.1
186
+ - num_devices: 1
187
+ - gradient_accumulation_steps: 4
188
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
189
+ - lr_scheduler_type: cosine
190
+ - lr_scheduler_warmup_steps: 100
191
+ - num_epochs: 1
192
+
193
+ ### Training Time = **1:57:00** hours
194
+
195
+ ### Weight & Biases Report
196
+
197
+ [Report-Link](https://api.wandb.ai/links/my-sft-team/d211juao)
198
+
199
+ ## 💻 Usage
200
+
201
+ ```python
202
+ !pip install -qU transformers accelerate
203
+
204
+ from transformers import AutoTokenizer
205
+ import transformers
206
+ import torch
207
+
208
+ model = "yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO"
209
+ messages = [{"role": "user", "content": "What is a large language model?"}]
210
+
211
+ tokenizer = AutoTokenizer.from_pretrained(model)
212
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
213
+ pipeline = transformers.pipeline(
214
+ "text-generation",
215
+ model=model,
216
+ torch_dtype=torch.float16,
217
+ device_map="auto",
218
+ )
219
+
220
+ outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
221
+ print(outputs[0]["generated_text"])
222
+ ```
223
+
224
+ ## 🏆 Evaluation Scores
225
+
226
+ Coming Soon
227
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
228
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_yuvraj17__Llama3-8B-SuperNova-Spectrum-Hermes-DPO)
229
+
230
+ | Metric |Value|
231
+ |-------------------|----:|
232
+ |Avg. |18.00|
233
+ |IFEval (0-Shot) |46.91|
234
+ |BBH (3-Shot) |21.24|
235
+ |MATH Lvl 5 (4-Shot)| 5.14|
236
+ |GPQA (0-shot) | 6.94|
237
+ |MuSR (0-shot) | 9.62|
238
+ |MMLU-PRO (5-shot) |18.16|
239
+
240
+
241
+