--- license: mit library_name: peft tags: - trl - dpo - generated_from_trainer - distilabel - argilla base_model: microsoft/phi-2 model-index: - name: phi2-lora-quantized-distilabel-intel-orca-dpo-pairs results: [] datasets: - argilla/distilabel-intel-orca-dpo-pairs language: - en pipeline_tag: text-generation --- # phi2-lora-quantized-distilabel-intel-orca-dpo-pairs This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs). The full training notebook can be found [here](https://colab.research.google.com/drive/1PGMj7jlkJaCiSNNihA2NtpILsRgkRXrJ?usp=sharing). It achieves the following results on the evaluation set: - Loss: 0.0972 - Rewards/chosen: 0.2699 - Rewards/rejected: -5.8246 - Rewards/accuracies: 0.9623 - Rewards/margins: 6.0944 - Logps/rejected: -311.1872 - Logps/chosen: -115.6127 - Logits/rejected: 0.0766 - Logits/chosen: 0.0242 ## Model description The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs). In order to scale LoRa approached for LLMs, I recommend looking at [predibase/lorax](https://github.com/predibase/lorax). You can play around with the model shown below. We load the LoRa adapter and bits_n_bytes config (only when CUDA is available). ```python import torch import torch from transformers import ( AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig ) from peft import PeftModel # template used for fine-tune # template = """\ # Instruct: {instruction}\n # Output: {response}""" if torch.cuda.is_available(): device = torch.device("cuda") print(f"Using {torch.cuda.get_device_name(0)}") bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_compute_dtype='float16', bnb_4bit_use_double_quant=False, ) elif torch.backends.mps.is_available(): device = torch.device("mps") bnb_config = None else: device = torch.device("cpu") bnb_config = None print("No GPU available, using CPU instead.") config = PeftConfig.from_pretrained("davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs") model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype=torch.float16, quantization_config=bnb_config) model = PeftModel.from_pretrained(model, "davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs").to(device) prompt = "Instruct: What is the capital of France? \nOutput:"" inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False) outputs = model.generate(**inputs) text = tokenizer.batch_decode(outputs)[0] ``` ## Intended uses & limitations This is a LoRa adapter fine-tine for phi-2 and not a full fine-tune of the model. Additionally, I did not spend time updating parameters. ## Training and evaluation data The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs). The full training notebook can be found [here](https://colab.research.google.com/drive/1PGMj7jlkJaCiSNNihA2NtpILsRgkRXrJ?usp=sharing). Underneath, there are some configs for the adapter and the trainer. ```python peft_config = LoraConfig( lora_alpha=16, lora_dropout=0.5, r=32, target_modules=['k_proj', 'q_proj', 'v_proj', 'fc1', 'fc2'], bias="none", task_type="CAUSAL_LM", ) ``` ```python training_arguments = TrainingArguments( output_dir=f"./{model_name}", evaluation_strategy="steps", do_eval=True, optim="paged_adamw_8bit", per_device_train_batch_size=2, gradient_accumulation_steps=16, per_device_eval_batch_size=2, log_level="debug", save_steps=20, logging_steps=20, learning_rate=1e-5, eval_steps=20, num_train_epochs=1, # Modified for tutorial purposes max_steps=100, warmup_steps=20, lr_scheduler_type="linear", ) ``` ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 2 - eval_batch_size: 2 - seed: 42 - gradient_accumulation_steps: 16 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 20 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6805 | 0.06 | 20 | 0.6540 | 0.0096 | -0.0728 | 0.8367 | 0.0824 | -253.6698 | -118.2153 | 0.3760 | 0.3395 | | 0.5821 | 0.12 | 40 | 0.4977 | 0.0383 | -0.4385 | 0.9199 | 0.4768 | -257.3268 | -117.9285 | 0.3836 | 0.3356 | | 0.4163 | 0.19 | 60 | 0.3225 | 0.0641 | -1.1656 | 0.9257 | 1.2298 | -264.5979 | -117.6701 | 0.3836 | 0.3192 | | 0.275 | 0.25 | 80 | 0.2245 | 0.0476 | -2.1180 | 0.9316 | 2.1656 | -274.1212 | -117.8351 | 0.3399 | 0.2698 | | 0.1808 | 0.31 | 100 | 0.1771 | -0.0012 | -3.2019 | 0.9366 | 3.2007 | -284.9609 | -118.3238 | 0.2615 | 0.1964 | | 0.1405 | 0.37 | 120 | 0.1528 | 0.0185 | -4.0396 | 0.9425 | 4.0581 | -293.3371 | -118.1262 | 0.1983 | 0.1407 | | 0.1121 | 0.44 | 140 | 0.1389 | 0.0285 | -4.6518 | 0.9471 | 4.6802 | -299.4591 | -118.0267 | 0.1493 | 0.0980 | | 0.1544 | 0.5 | 160 | 0.1289 | 0.0745 | -4.9025 | 0.9506 | 4.9771 | -301.9670 | -117.5659 | 0.1257 | 0.0785 | | 0.1594 | 0.56 | 180 | 0.1204 | 0.1435 | -4.8770 | 0.9561 | 5.0205 | -301.7119 | -116.8765 | 0.1168 | 0.0696 | | 0.0988 | 0.62 | 200 | 0.1136 | 0.1830 | -5.1569 | 0.9576 | 5.3400 | -304.5108 | -116.4809 | 0.1078 | 0.0579 | | 0.1141 | 0.68 | 220 | 0.1080 | 0.2052 | -5.4532 | 0.9580 | 5.6584 | -307.4731 | -116.2591 | 0.0962 | 0.0460 | | 0.0943 | 0.75 | 240 | 0.1037 | 0.2326 | -5.6061 | 0.9592 | 5.8387 | -309.0026 | -115.9850 | 0.0913 | 0.0393 | | 0.1108 | 0.81 | 260 | 0.1008 | 0.2500 | -5.7399 | 0.9607 | 5.9900 | -310.3409 | -115.8109 | 0.0827 | 0.0316 | | 0.1088 | 0.87 | 280 | 0.0987 | 0.2677 | -5.7068 | 0.9619 | 5.9745 | -310.0096 | -115.6346 | 0.0825 | 0.0301 | | 0.0741 | 0.93 | 300 | 0.0975 | 0.2701 | -5.7873 | 0.9623 | 6.0574 | -310.8145 | -115.6102 | 0.0788 | 0.0261 | | 0.1059 | 1.0 | 320 | 0.0972 | 0.2699 | -5.8246 | 0.9623 | 6.0944 | -311.1872 | -115.6127 | 0.0766 | 0.0242 | ### Framework versions - PEFT 0.7.1 - Transformers 4.37.1 - Pytorch 2.1.0+cu121 - Datasets 2.16.1 - Tokenizers 0.15.1