yuvraj17 commited on
Commit
302527c
1 Parent(s): 66cc4e7

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ llama3-8b-supernova-spectrum-hermes-dpo.Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
37
+ llama3-8b-supernova-spectrum-hermes-dpo.bf16.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ library_name: transformers
5
+ license: apache-2.0
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - dpo
9
+ - rlhf
10
+ - trl
11
+ - autoquant
12
+ - gguf
13
+ ---
14
+
15
+ # Llama3-8B-SuperNova-Spectrum-Hermes-DPO
16
+
17
+ This model is a **DPO fine-tuned** version of my `DARE_TIES` merged Model [`yuvraj17/Llama3-8B-SuperNova-Spectrum-dare_ties`](https://huggingface.co/yuvraj17/Llama3-8B-SuperNova-Spectrum-dare_ties) on the [yuvraj17/chatml-OpenHermes2.5-dpo-binarized-alpha-2k](https://huggingface.co/datasets/yuvraj17/chatml-OpenHermes2.5-dpo-binarized-alpha-2k) dataset.
18
+
19
+ ## DPO (Direct Preference Optimization):
20
+
21
+ Direct Preference Optimization (DPO) is a fine-tuning technique that focuses on aligning a model's responses with human preferences or ranking data without requiring reinforcement learning steps, like in RLHF.
22
+
23
+ <figure>
24
+
25
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/66137d95e8d2cda230ddcea6/kHcU5dkcSVqxEIWt_GRUB.png" width="1000" height="768">
26
+ <figcaption> DPO vs RLHF <a href="//arxiv.org/abs/2305.18290">Reference</a> </figcaption>
27
+
28
+ </figure>
29
+
30
+ ## Training:
31
+
32
+ - Trained on **1x A40s (48GB VRAM)** using the [HuggingFace TRL](https://huggingface.co/docs/trl/index).
33
+ - **QLoRA**(`4-bit precision`) for 1 epoch
34
+ ```
35
+ # LoRA configuration
36
+ peft_config = LoraConfig(
37
+ r=32,
38
+ lora_alpha=16,
39
+ lora_dropout=0.05,
40
+ bias="none",
41
+ task_type="CAUSAL_LM",
42
+ target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
43
+ )
44
+ ```
45
+ ### Training Params
46
+
47
+ The following hyperparameters were used during training:
48
+ - learning_rate: 5e-05
49
+ - beta=0.1
50
+ - num_devices: 1
51
+ - gradient_accumulation_steps: 4
52
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
53
+ - lr_scheduler_type: cosine
54
+ - lr_scheduler_warmup_steps: 100
55
+ - num_epochs: 1
56
+
57
+ ### Training Time = **1:57:00** hours
58
+
59
+ ### Weight & Biases Report
60
+
61
+ [Report-Link](https://api.wandb.ai/links/my-sft-team/d211juao)
62
+
63
+ ## 💻 Usage
64
+
65
+ ```python
66
+ !pip install -qU transformers accelerate
67
+
68
+ from transformers import AutoTokenizer
69
+ import transformers
70
+ import torch
71
+
72
+ model = "yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO"
73
+ messages = [{"role": "user", "content": "What is a large language model?"}]
74
+
75
+ tokenizer = AutoTokenizer.from_pretrained(model)
76
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
77
+ pipeline = transformers.pipeline(
78
+ "text-generation",
79
+ model=model,
80
+ torch_dtype=torch.float16,
81
+ device_map="auto",
82
+ )
83
+
84
+ outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
85
+ print(outputs[0]["generated_text"])
86
+ ```
87
+
88
+ ## 🏆 Evaluation Scores
89
+
90
+ Coming Soon
llama3-8b-supernova-spectrum-hermes-dpo.Q3_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f26a7a303c5b1cc18f65f1aecefaec70e6ce3b2f551a4fb9c170655f078b68da
3
+ size 4018917920
llama3-8b-supernova-spectrum-hermes-dpo.bf16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b4ce5eccac9eb4aa88a469e0c1701b08bd219256515c079aaf9a67eb242a4de
3
+ size 16068891168