File size: 3,452 Bytes
9b5e6b0 8a5eb14 9b5e6b0 8a5eb14 9b5e6b0 8a5eb14 9b5e6b0 8a5eb14 9b5e6b0 8a5eb14 9b5e6b0 8a5eb14 9b5e6b0 8a5eb14 9b5e6b0 8a5eb14 9b5e6b0 8a5eb14 9b5e6b0 8a5eb14 9b5e6b0 8a5eb14 9b5e6b0 8a5eb14 9b5e6b0 8a5eb14 9b5e6b0 8a5eb14 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
---
library_name: transformers
license: mit
datasets:
- DhruvParth/Mistral-7B-Instruct-v2.0-PairRM-DPO-Dataset
language:
- en
---
# Model Card for DhruvParth/Mistral-7B-Instruct-v2.0-PairRM-DPO
This model is a fine-tuned version of the Mistral-7B model, utilizing Direct Preference Optimization (DPO) to better align the model's responses with human preferences, specifically in a causal language modeling context.
## Model Details
### Model Description
- Developed by: Dhruv Parthasarathy
- Model type: Fine-tuned language model
- Language(s) (NLP): English
- License: MIT
- Finetuned from model: Mistral-7B-Instruct-v2.0
### Model Sources
- **Repository:** https://huggingface.co/DhruvParth
- **Paper:** Direct Preference Optimization (https://arxiv.org/abs/2305.18290)
- **Demo:** (Will soon be made available)
## Uses
This model is tailored for scenarios requiring alignment with human preferences in automated responses, suitable for applications in personalized chatbots, customer support, and other interactive services.
## Training Details
### Notebook
The fine-tuning process and the experiments were documented in a Jupyter Notebook, available [here](https://github.com/parthasarathydNU/gen-ai-coursework/blob/main/advanced-llms/direct-preference-optimization/dpomistralfinetuning.ipynb).
### Training Configuration
#### LoRA Configuration
```python
LoraConfig(
r=8,
lora_alpha=8,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=['k_proj', 'v_proj', 'q_proj', 'dense']
)
```
#### BitsAndBytes Configuration
```python
BitsAndBytesConfig(
load_in_4bit=True,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
```
#### Training Device Setup
```python
device_map = {"": 0}
```
#### Training Arguments
```python
DPOConfig(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
learning_rate=5e-5,
lr_scheduler_type="cosine",
max_steps=50,
save_strategy="no",
logging_steps=1,
output_dir=new_model,
optim="paged_adamw_32bit",
warmup_steps=5,
)
```
### DPO Trainer Setup
```python
DPOTrainer(
model,
args=training_args,
train_dataset=updated_train_dataset,
tokenizer=tokenizer,
peft_config=peft_config,
beta=0.1,
max_prompt_length=512,
max_length=1024,
)
```
## Evaluation
Details on the model's performance, evaluation protocols, and results will be provided as they become available.
## Citation
If you use this model or dataset, please cite it as follows:
**BibTeX:**
```bibtex
@misc{dhruvparth_mistral7b_dpo_2024,
author = {Dhruv Parthasarathy},
title = {Fine-tuning LLMs with Direct Preference Optimization},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
url = {https://huggingface.co/DhruvParth/Mistral-7B-Instruct-v2.0-PairRM-DPO}
}
```
**APA:**
Dhruv Parthasarathy. (2024). Fine-tuning LLMs with Direct Preference Optimization. GitHub repository, https://huggingface.co/DhruvParth/Mistral-7B-Instruct-v2.0-PairRM-DPO
For any queries or discussions regarding the project, please open an issue in the GitHub repository, post your comment in the community section, reach out to me via LinkedIn (https://www.linkedin.com/in/parthadhruv/) or contact me directly at [email protected].
|