Update README.md
Browse files
README.md
CHANGED
@@ -5,40 +5,126 @@ tags:
|
|
5 |
- trl
|
6 |
- dpo
|
7 |
- generated_from_trainer
|
|
|
|
|
8 |
base_model: microsoft/phi-2
|
9 |
model-index:
|
10 |
-
- name: phi2-lora-distilabel-intel-orca-dpo-pairs
|
11 |
results: []
|
|
|
|
|
|
|
|
|
|
|
12 |
---
|
13 |
|
14 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
15 |
should probably proofread and complete it, then remove this comment. -->
|
16 |
|
17 |
-
# phi2-lora-distilabel-intel-orca-dpo-pairs
|
|
|
|
|
|
|
18 |
|
19 |
-
This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on an unknown dataset.
|
20 |
It achieves the following results on the evaluation set:
|
21 |
-
- Loss: 0.
|
22 |
-
- Rewards/chosen:
|
23 |
-
- Rewards/rejected: -
|
24 |
-
- Rewards/accuracies: 0.
|
25 |
-
- Rewards/margins:
|
26 |
-
- Logps/rejected: -
|
27 |
-
- Logps/chosen: -
|
28 |
-
- Logits/rejected: 0.
|
29 |
-
- Logits/chosen: 0.
|
30 |
|
31 |
## Model description
|
32 |
|
33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
## Intended uses & limitations
|
36 |
|
37 |
-
|
38 |
|
39 |
## Training and evaluation data
|
40 |
|
41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
## Training procedure
|
44 |
|
@@ -60,28 +146,28 @@ The following hyperparameters were used during training:
|
|
60 |
|
61 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
62 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
63 |
-
| 0.
|
64 |
-
| 0.
|
65 |
-
| 0.
|
66 |
-
| 0.
|
67 |
-
| 0.
|
68 |
-
| 0.
|
69 |
-
| 0.
|
70 |
-
| 0.
|
71 |
-
| 0.
|
72 |
-
| 0.
|
73 |
-
| 0.
|
74 |
-
| 0.
|
75 |
-
| 0.
|
76 |
-
| 0.
|
77 |
-
| 0.
|
78 |
-
| 0.
|
79 |
|
80 |
|
81 |
### Framework versions
|
82 |
|
83 |
- PEFT 0.7.1
|
84 |
- Transformers 4.37.1
|
85 |
-
- Pytorch 2.1.0+
|
86 |
- Datasets 2.16.1
|
87 |
- Tokenizers 0.15.1
|
|
|
5 |
- trl
|
6 |
- dpo
|
7 |
- generated_from_trainer
|
8 |
+
- distilabel
|
9 |
+
- argilla
|
10 |
base_model: microsoft/phi-2
|
11 |
model-index:
|
12 |
+
- name: phi2-lora-quantized-distilabel-intel-orca-dpo-pairs
|
13 |
results: []
|
14 |
+
datasets:
|
15 |
+
- argilla/distilabel-intel-orca-dpo-pairs
|
16 |
+
language:
|
17 |
+
- en
|
18 |
+
pipeline_tag: text-generation
|
19 |
---
|
20 |
|
21 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
22 |
should probably proofread and complete it, then remove this comment. -->
|
23 |
|
24 |
+
# phi2-lora-quantized-distilabel-intel-orca-dpo-pairs
|
25 |
+
|
26 |
+
This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs).
|
27 |
+
The full training notebook can be found [here](https://colab.research.google.com/drive/1PGMj7jlkJaCiSNNihA2NtpILsRgkRXrJ?usp=sharing).
|
28 |
|
|
|
29 |
It achieves the following results on the evaluation set:
|
30 |
+
- Loss: 0.0972
|
31 |
+
- Rewards/chosen: 0.2699
|
32 |
+
- Rewards/rejected: -5.8246
|
33 |
+
- Rewards/accuracies: 0.9623
|
34 |
+
- Rewards/margins: 6.0944
|
35 |
+
- Logps/rejected: -311.1872
|
36 |
+
- Logps/chosen: -115.6127
|
37 |
+
- Logits/rejected: 0.0766
|
38 |
+
- Logits/chosen: 0.0242
|
39 |
|
40 |
## Model description
|
41 |
|
42 |
+
The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs). In order to scale LoRa approached for LLMs, I recommend looking at [predibase/lorax](https://github.com/predibase/lorax).
|
43 |
+
|
44 |
+
You can play around with the model shown below. We load the LoRa adapter and bits_n_bytes config (only when CUDA is available).
|
45 |
+
|
46 |
+
```python
|
47 |
+
import torch
|
48 |
+
import torch
|
49 |
+
from transformers import (
|
50 |
+
AutoModelForCausalLM,
|
51 |
+
AutoTokenizer,
|
52 |
+
BitsAndBytesConfig
|
53 |
+
)
|
54 |
+
from peft import PeftModel
|
55 |
+
|
56 |
+
# template used for fine-tune
|
57 |
+
# template = """\
|
58 |
+
# Instruct: {instruction}\n
|
59 |
+
# Output: {response}"""
|
60 |
+
|
61 |
+
if torch.cuda.is_available():
|
62 |
+
device = torch.device("cuda")
|
63 |
+
print(f"Using {torch.cuda.get_device_name(0)}")
|
64 |
+
bnb_config = BitsAndBytesConfig(
|
65 |
+
load_in_4bit=True,
|
66 |
+
bnb_4bit_quant_type='nf4',
|
67 |
+
bnb_4bit_compute_dtype='float16',
|
68 |
+
bnb_4bit_use_double_quant=False,
|
69 |
+
)
|
70 |
+
elif torch.backends.mps.is_available():
|
71 |
+
device = torch.device("mps")
|
72 |
+
bnb_config = None
|
73 |
+
else:
|
74 |
+
device = torch.device("cpu")
|
75 |
+
bnb_config = None
|
76 |
+
print("No GPU available, using CPU instead.")
|
77 |
+
|
78 |
+
config = PeftConfig.from_pretrained("davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs")
|
79 |
+
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype=torch.float16, quantization_config=bnb_config)
|
80 |
+
model = PeftModel.from_pretrained(model, "davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs").to(device)
|
81 |
+
|
82 |
+
prompt = "Instruct: What is the capital of France? \nOutput:""
|
83 |
+
inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False)
|
84 |
+
|
85 |
+
outputs = model.generate(**inputs)
|
86 |
+
text = tokenizer.batch_decode(outputs)[0]
|
87 |
+
```
|
88 |
|
89 |
## Intended uses & limitations
|
90 |
|
91 |
+
This is a LoRa adapter fine-tine for phi-2 and not a full fine-tune of the model. Additionally, I did not spend time updating parameters.
|
92 |
|
93 |
## Training and evaluation data
|
94 |
|
95 |
+
The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs). The full training notebook can be found [here](https://colab.research.google.com/drive/1PGMj7jlkJaCiSNNihA2NtpILsRgkRXrJ?usp=sharing). Underneath, there are some configs for the adapter and the trainer.
|
96 |
+
|
97 |
+
```python
|
98 |
+
peft_config = LoraConfig(
|
99 |
+
lora_alpha=16,
|
100 |
+
lora_dropout=0.5,
|
101 |
+
r=32,
|
102 |
+
target_modules=['k_proj', 'q_proj', 'v_proj', 'fc1', 'fc2'],
|
103 |
+
bias="none",
|
104 |
+
task_type="CAUSAL_LM",
|
105 |
+
)
|
106 |
+
```
|
107 |
+
|
108 |
+
```python
|
109 |
+
training_arguments = TrainingArguments(
|
110 |
+
output_dir=f"./{model_name}",
|
111 |
+
evaluation_strategy="steps",
|
112 |
+
do_eval=True,
|
113 |
+
optim="paged_adamw_8bit",
|
114 |
+
per_device_train_batch_size=2,
|
115 |
+
gradient_accumulation_steps=16,
|
116 |
+
per_device_eval_batch_size=2,
|
117 |
+
log_level="debug",
|
118 |
+
save_steps=20,
|
119 |
+
logging_steps=20,
|
120 |
+
learning_rate=1e-5,
|
121 |
+
eval_steps=20,
|
122 |
+
num_train_epochs=1, # Modified for tutorial purposes
|
123 |
+
max_steps=100,
|
124 |
+
warmup_steps=20,
|
125 |
+
lr_scheduler_type="linear",
|
126 |
+
)
|
127 |
+
```
|
128 |
|
129 |
## Training procedure
|
130 |
|
|
|
146 |
|
147 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
148 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
149 |
+
| 0.6805 | 0.06 | 20 | 0.6540 | 0.0096 | -0.0728 | 0.8367 | 0.0824 | -253.6698 | -118.2153 | 0.3760 | 0.3395 |
|
150 |
+
| 0.5821 | 0.12 | 40 | 0.4977 | 0.0383 | -0.4385 | 0.9199 | 0.4768 | -257.3268 | -117.9285 | 0.3836 | 0.3356 |
|
151 |
+
| 0.4163 | 0.19 | 60 | 0.3225 | 0.0641 | -1.1656 | 0.9257 | 1.2298 | -264.5979 | -117.6701 | 0.3836 | 0.3192 |
|
152 |
+
| 0.275 | 0.25 | 80 | 0.2245 | 0.0476 | -2.1180 | 0.9316 | 2.1656 | -274.1212 | -117.8351 | 0.3399 | 0.2698 |
|
153 |
+
| 0.1808 | 0.31 | 100 | 0.1771 | -0.0012 | -3.2019 | 0.9366 | 3.2007 | -284.9609 | -118.3238 | 0.2615 | 0.1964 |
|
154 |
+
| 0.1405 | 0.37 | 120 | 0.1528 | 0.0185 | -4.0396 | 0.9425 | 4.0581 | -293.3371 | -118.1262 | 0.1983 | 0.1407 |
|
155 |
+
| 0.1121 | 0.44 | 140 | 0.1389 | 0.0285 | -4.6518 | 0.9471 | 4.6802 | -299.4591 | -118.0267 | 0.1493 | 0.0980 |
|
156 |
+
| 0.1544 | 0.5 | 160 | 0.1289 | 0.0745 | -4.9025 | 0.9506 | 4.9771 | -301.9670 | -117.5659 | 0.1257 | 0.0785 |
|
157 |
+
| 0.1594 | 0.56 | 180 | 0.1204 | 0.1435 | -4.8770 | 0.9561 | 5.0205 | -301.7119 | -116.8765 | 0.1168 | 0.0696 |
|
158 |
+
| 0.0988 | 0.62 | 200 | 0.1136 | 0.1830 | -5.1569 | 0.9576 | 5.3400 | -304.5108 | -116.4809 | 0.1078 | 0.0579 |
|
159 |
+
| 0.1141 | 0.68 | 220 | 0.1080 | 0.2052 | -5.4532 | 0.9580 | 5.6584 | -307.4731 | -116.2591 | 0.0962 | 0.0460 |
|
160 |
+
| 0.0943 | 0.75 | 240 | 0.1037 | 0.2326 | -5.6061 | 0.9592 | 5.8387 | -309.0026 | -115.9850 | 0.0913 | 0.0393 |
|
161 |
+
| 0.1108 | 0.81 | 260 | 0.1008 | 0.2500 | -5.7399 | 0.9607 | 5.9900 | -310.3409 | -115.8109 | 0.0827 | 0.0316 |
|
162 |
+
| 0.1088 | 0.87 | 280 | 0.0987 | 0.2677 | -5.7068 | 0.9619 | 5.9745 | -310.0096 | -115.6346 | 0.0825 | 0.0301 |
|
163 |
+
| 0.0741 | 0.93 | 300 | 0.0975 | 0.2701 | -5.7873 | 0.9623 | 6.0574 | -310.8145 | -115.6102 | 0.0788 | 0.0261 |
|
164 |
+
| 0.1059 | 1.0 | 320 | 0.0972 | 0.2699 | -5.8246 | 0.9623 | 6.0944 | -311.1872 | -115.6127 | 0.0766 | 0.0242 |
|
165 |
|
166 |
|
167 |
### Framework versions
|
168 |
|
169 |
- PEFT 0.7.1
|
170 |
- Transformers 4.37.1
|
171 |
+
- Pytorch 2.1.0+cu121
|
172 |
- Datasets 2.16.1
|
173 |
- Tokenizers 0.15.1
|