argilla
/

phi2-lora-distilabel-intel-orca-dpo-pairs

@@ -5,40 +5,126 @@ tags:
 - trl
 - dpo
 - generated_from_trainer
 base_model: microsoft/phi-2
 model-index:
-- name: phi2-lora-distilabel-intel-orca-dpo-pairs
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# phi2-lora-distilabel-intel-orca-dpo-pairs
-This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.4537
-- Rewards/chosen: -0.0837
-- Rewards/rejected: -1.2628
-- Rewards/accuracies: 0.8301
-- Rewards/margins: 1.1791
-- Logps/rejected: -224.8409
-- Logps/chosen: -203.2228
-- Logits/rejected: 0.4773
-- Logits/chosen: 0.3062
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
@@ -60,28 +146,28 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.6853        | 0.06  | 20   | 0.6701          | 0.0133         | -0.0368          | 0.6905             | 0.0501          | -212.5803      | -202.2522    | 0.3853          | 0.2532        |
-| 0.6312        | 0.12  | 40   | 0.5884          | 0.0422         | -0.2208          | 0.8138             | 0.2630          | -214.4207      | -201.9638    | 0.4254          | 0.2816        |
-| 0.547         | 0.19  | 60   | 0.5146          | 0.0172         | -0.5786          | 0.8278             | 0.5958          | -217.9983      | -202.2132    | 0.4699          | 0.3110        |
-| 0.4388        | 0.25  | 80   | 0.4893          | -0.0808        | -1.0789          | 0.8293             | 0.9981          | -223.0014      | -203.1934    | 0.5158          | 0.3396        |
-| 0.4871        | 0.31  | 100  | 0.4818          | -0.1298        | -1.2346          | 0.8297             | 1.1048          | -224.5586      | -203.6837    | 0.5133          | 0.3340        |
-| 0.4863        | 0.37  | 120  | 0.4723          | -0.1230        | -1.1718          | 0.8301             | 1.0488          | -223.9305      | -203.6159    | 0.4910          | 0.3167        |
-| 0.4578        | 0.44  | 140  | 0.4666          | -0.1257        | -1.1772          | 0.8301             | 1.0515          | -223.9844      | -203.6428    | 0.4795          | 0.3078        |
-| 0.4587        | 0.5   | 160  | 0.4625          | -0.0746        | -1.1272          | 0.8301             | 1.0526          | -223.4841      | -203.1310    | 0.4857          | 0.3139        |
-| 0.4688        | 0.56  | 180  | 0.4595          | -0.0584        | -1.1194          | 0.8297             | 1.0610          | -223.4062      | -202.9692    | 0.4890          | 0.3171        |
-| 0.4189        | 0.62  | 200  | 0.4579          | -0.0666        | -1.1647          | 0.8297             | 1.0982          | -223.8598      | -203.0511    | 0.4858          | 0.3138        |
-| 0.4392        | 0.68  | 220  | 0.4564          | -0.0697        | -1.1915          | 0.8301             | 1.1219          | -224.1278      | -203.0823    | 0.4824          | 0.3110        |
-| 0.4659        | 0.75  | 240  | 0.4554          | -0.0826        | -1.2245          | 0.8301             | 1.1419          | -224.4574      | -203.2112    | 0.4761          | 0.3052        |
-| 0.4075        | 0.81  | 260  | 0.4544          | -0.0823        | -1.2328          | 0.8301             | 1.1504          | -224.5403      | -203.2089    | 0.4749          | 0.3044        |
-| 0.4015        | 0.87  | 280  | 0.4543          | -0.0833        | -1.2590          | 0.8301             | 1.1757          | -224.8026      | -203.2188    | 0.4779          | 0.3067        |
-| 0.4365        | 0.93  | 300  | 0.4539          | -0.0846        | -1.2658          | 0.8301             | 1.1812          | -224.8702      | -203.2313    | 0.4780          | 0.3067        |
-| 0.4589        | 1.0   | 320  | 0.4537          | -0.0837        | -1.2628          | 0.8301             | 1.1791          | -224.8409      | -203.2228    | 0.4773          | 0.3062        |
 ### Framework versions
 - PEFT 0.7.1
 - Transformers 4.37.1
-- Pytorch 2.1.0+cu118
 - Datasets 2.16.1
 - Tokenizers 0.15.1

 - trl
 - dpo
 - generated_from_trainer
+- distilabel
+- argilla
 base_model: microsoft/phi-2
 model-index:
+- name: phi2-lora-quantized-distilabel-intel-orca-dpo-pairs
   results: []
+datasets:
+- argilla/distilabel-intel-orca-dpo-pairs
+language:
+- en
+pipeline_tag: text-generation
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# phi2-lora-quantized-distilabel-intel-orca-dpo-pairs
+This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs).
+The full training notebook can be found [here](https://colab.research.google.com/drive/1PGMj7jlkJaCiSNNihA2NtpILsRgkRXrJ?usp=sharing).
 It achieves the following results on the evaluation set:
+- Loss: 0.0972
+- Rewards/chosen: 0.2699
+- Rewards/rejected: -5.8246
+- Rewards/accuracies: 0.9623
+- Rewards/margins: 6.0944
+- Logps/rejected: -311.1872
+- Logps/chosen: -115.6127
+- Logits/rejected: 0.0766
+- Logits/chosen: 0.0242
 ## Model description
+The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs). In order to scale LoRa approached for LLMs, I recommend looking at [predibase/lorax](https://github.com/predibase/lorax).
+You can play around with the model shown below. We load the LoRa adapter and bits_n_bytes config (only when CUDA is available).
+```python
+import torch
+import torch
+from transformers import (
+    AutoModelForCausalLM,
+    AutoTokenizer,
+    BitsAndBytesConfig
+)
+from peft import PeftModel
+# template used for fine-tune
+# template = """\
+# Instruct: {instruction}\n
+# Output: {response}"""
+if torch.cuda.is_available():
+    device = torch.device("cuda")
+    print(f"Using {torch.cuda.get_device_name(0)}")
+    bnb_config = BitsAndBytesConfig(
+        load_in_4bit=True,
+        bnb_4bit_quant_type='nf4',
+        bnb_4bit_compute_dtype='float16',
+        bnb_4bit_use_double_quant=False,
+    )
+elif torch.backends.mps.is_available():
+    device = torch.device("mps")
+    bnb_config = None
+else:
+    device = torch.device("cpu")
+    bnb_config = None
+    print("No GPU available, using CPU instead.")
+config = PeftConfig.from_pretrained("davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs")
+model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype=torch.float16, quantization_config=bnb_config)
+model = PeftModel.from_pretrained(model, "davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs").to(device)
+prompt = "Instruct: What is the capital of France? \nOutput:""
+inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False)
+outputs = model.generate(**inputs)
+text = tokenizer.batch_decode(outputs)[0]
+```
 ## Intended uses & limitations
+This is a LoRa adapter fine-tine for phi-2 and not a full fine-tune of the model. Additionally, I did not spend time updating parameters.
 ## Training and evaluation data
+The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs). The full training notebook can be found [here](https://colab.research.google.com/drive/1PGMj7jlkJaCiSNNihA2NtpILsRgkRXrJ?usp=sharing). Underneath, there are some configs for the adapter and the trainer.
+```python
+peft_config = LoraConfig(
+    lora_alpha=16,
+    lora_dropout=0.5,
+    r=32,
+    target_modules=['k_proj', 'q_proj', 'v_proj', 'fc1', 'fc2'],
+    bias="none",
+    task_type="CAUSAL_LM",
+)
+```
+```python
+training_arguments = TrainingArguments(
+    output_dir=f"./{model_name}",
+    evaluation_strategy="steps",
+    do_eval=True,
+    optim="paged_adamw_8bit",
+    per_device_train_batch_size=2,
+    gradient_accumulation_steps=16,
+    per_device_eval_batch_size=2,
+    log_level="debug",
+    save_steps=20,
+    logging_steps=20,
+    learning_rate=1e-5,
+    eval_steps=20,
+    num_train_epochs=1, # Modified for tutorial purposes
+    max_steps=100,
+    warmup_steps=20,
+    lr_scheduler_type="linear",
+)
+```
 ## Training procedure
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6805        | 0.06  | 20   | 0.6540          | 0.0096         | -0.0728          | 0.8367             | 0.0824          | -253.6698      | -118.2153    | 0.3760          | 0.3395        |
+| 0.5821        | 0.12  | 40   | 0.4977          | 0.0383         | -0.4385          | 0.9199             | 0.4768          | -257.3268      | -117.9285    | 0.3836          | 0.3356        |
+| 0.4163        | 0.19  | 60   | 0.3225          | 0.0641         | -1.1656          | 0.9257             | 1.2298          | -264.5979      | -117.6701    | 0.3836          | 0.3192        |
+| 0.275         | 0.25  | 80   | 0.2245          | 0.0476         | -2.1180          | 0.9316             | 2.1656          | -274.1212      | -117.8351    | 0.3399          | 0.2698        |
+| 0.1808        | 0.31  | 100  | 0.1771          | -0.0012        | -3.2019          | 0.9366             | 3.2007          | -284.9609      | -118.3238    | 0.2615          | 0.1964        |
+| 0.1405        | 0.37  | 120  | 0.1528          | 0.0185         | -4.0396          | 0.9425             | 4.0581          | -293.3371      | -118.1262    | 0.1983          | 0.1407        |
+| 0.1121        | 0.44  | 140  | 0.1389          | 0.0285         | -4.6518          | 0.9471             | 4.6802          | -299.4591      | -118.0267    | 0.1493          | 0.0980        |
+| 0.1544        | 0.5   | 160  | 0.1289          | 0.0745         | -4.9025          | 0.9506             | 4.9771          | -301.9670      | -117.5659    | 0.1257          | 0.0785        |
+| 0.1594        | 0.56  | 180  | 0.1204          | 0.1435         | -4.8770          | 0.9561             | 5.0205          | -301.7119      | -116.8765    | 0.1168          | 0.0696        |
+| 0.0988        | 0.62  | 200  | 0.1136          | 0.1830         | -5.1569          | 0.9576             | 5.3400          | -304.5108      | -116.4809    | 0.1078          | 0.0579        |
+| 0.1141        | 0.68  | 220  | 0.1080          | 0.2052         | -5.4532          | 0.9580             | 5.6584          | -307.4731      | -116.2591    | 0.0962          | 0.0460        |
+| 0.0943        | 0.75  | 240  | 0.1037          | 0.2326         | -5.6061          | 0.9592             | 5.8387          | -309.0026      | -115.9850    | 0.0913          | 0.0393        |
+| 0.1108        | 0.81  | 260  | 0.1008          | 0.2500         | -5.7399          | 0.9607             | 5.9900          | -310.3409      | -115.8109    | 0.0827          | 0.0316        |
+| 0.1088        | 0.87  | 280  | 0.0987          | 0.2677         | -5.7068          | 0.9619             | 5.9745          | -310.0096      | -115.6346    | 0.0825          | 0.0301        |
+| 0.0741        | 0.93  | 300  | 0.0975          | 0.2701         | -5.7873          | 0.9623             | 6.0574          | -310.8145      | -115.6102    | 0.0788          | 0.0261        |
+| 0.1059        | 1.0   | 320  | 0.0972          | 0.2699         | -5.8246          | 0.9623             | 6.0944          | -311.1872      | -115.6127    | 0.0766          | 0.0242        |
 ### Framework versions
 - PEFT 0.7.1
 - Transformers 4.37.1
+- Pytorch 2.1.0+cu121
 - Datasets 2.16.1
 - Tokenizers 0.15.1