GGUF
Inference Endpoints
decruz07 commited on
Commit
a2d8e7a
·
verified ·
1 Parent(s): bb06404

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -0
README.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mlabonne/Marcoro14-7B-slerp
3
+ license: apache-2.0
4
+ datasets:
5
+ - argilla/distilabel-intel-orca-dpo-pairs
6
+ ---
7
+
8
+ # Model Card for decruz07/kellemar-DPO-Orca-Distilled-7B
9
+
10
+ <!-- Provide a quick summary of what the model is/does. -->
11
+
12
+ This model was created using mlabonne/Marcoro14-7B-slerp as the base, and finetuned with argilla/distilabel-intel-orca-dpo-pairs
13
+
14
+ These are the GGUF versions. Both qk4m and qk5m variations are available for download.
15
+
16
+ ## Model Details
17
+
18
+ Finetuned with these specific parameters:
19
+ Steps: 200
20
+ Learning Rate: 5e5
21
+ Beta: 0.1
22
+
23
+ ### Model Description
24
+
25
+ <!-- Provide a longer summary of what this model is. -->
26
+
27
+ - **Developed by:** @decruz
28
+ - **Funded by [optional]:** my full-time job
29
+ - **Finetuned from model [optional]:** mlabonne/Marcoro14-7B-slerp
30
+
31
+ ## Benchmarks
32
+ Top 5 in OpenLLM Benchmarks as of 2024/01/17
33
+
34
+ **OpenLLM**
35
+ |Model| Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
36
+ |---|---|---|---|---|---|---|---|
37
+ |**kellemar-DPO-Orca-Distilled-7B-SLERP**| 73.71 | 70.48 | 87.56 | 65.33 |64.97 | 81.93 | 72.02 |
38
+
39
+ **Nous**
40
+ Model| AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
41
+ |---|---|---|---|---|---|
42
+ |**kellemar-DPO-Orca-Distilled-7B-SLERP**| 45.27 | 76.42 | 65.48 | 47.21 |58.6 |
43
+ |Marcoro14-7B-slerp| 44.66 | 76.24 | 64.15 | 45.64 |57.67 |
44
+ |kellemar-DPO-Orca-Distilled-7B| 43.61 | 73.14 | 55.73 | 42.28 |53.69 |
45
+ |kellemar-Orca-DPO-7B| 43.35 | 73.43 | 54.02 | 42.24 |53.26 |
46
+ |OpenHermes-2.5-Mistral-7B| 43.07 | 73.12 | 53.04 | 40.96 |52.38 |
47
+
48
+ ## Uses
49
+
50
+ You can use this for basic inference. You could probably finetune with this if you want to.
51
+
52
+
53
+ ## How to Get Started with the Model
54
+
55
+ You can create a space out of this, or use basic python code to call the model directly and make inferences to it.
56
+
57
+ [More Information Needed]
58
+
59
+ ## Training Details
60
+
61
+ The following was used:
62
+ `training_args = TrainingArguments(
63
+ per_device_train_batch_size=4,
64
+ gradient_accumulation_steps=4,
65
+ gradient_checkpointing=True,
66
+ learning_rate=5e-5,
67
+ lr_scheduler_type="cosine",
68
+ max_steps=200,
69
+ save_strategy="no",
70
+ logging_steps=1,
71
+ output_dir=new_model,
72
+ optim="paged_adamw_32bit",
73
+ warmup_steps=100,
74
+ bf16=True,
75
+ report_to="wandb",
76
+ )
77
+
78
+ # Create DPO trainer
79
+ dpo_trainer = DPOTrainer(
80
+ model,
81
+ ref_model,
82
+ args=training_args,
83
+ train_dataset=dataset,
84
+ tokenizer=tokenizer,
85
+ peft_config=peft_config,
86
+ beta=0.1,
87
+ max_prompt_length=1024,
88
+ max_length=1536,
89
+ )`
90
+
91
+ ### Training Data
92
+
93
+ This was trained with https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs
94
+
95
+ ### Training Procedure
96
+
97
+ Trained with Labonne's Google Colab Notebook on Finetuning Mistral 7B with DPO.
98
+
99
+ ## Model Card Authors [optional]
100
+
101
+ @decruz
102
+
103
+ ## Model Card Contact
104
+
105
+ @decruz on X/Twitter