|
--- |
|
library_name: transformers |
|
tags: |
|
- trl |
|
- dpo |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
## Model Details |
|
|
|
Finetune Llama-3-8B model with Orca-DPO dataset. |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
Trained on Orca dataset (DPO). |
|
|
|
### Training Procedure |
|
|
|
Add NEFTune module for robustness, and fine-tune the model with DPO trainer. |
|
|
|
#### Training Hyperparameters |
|
|
|
- lora_alpha = 16 |
|
- lora_r = 64 |
|
- lora_dropout = 0.1 |
|
- adam_beta1 = 0.9 |
|
- adam_beta2 = 0.999 |
|
- weight_decay = 0.001 |
|
- max_grad_norm = 0.3 |
|
- learning_rate = 2e-4 |
|
- bnb_4bit_quant_type = nf4 |
|
- optim = "paged_adamw_32bit" |
|
- optimizer_type = "paged_adamw_32bit" |
|
- max_steps = 5000 |
|
- gradient_accumulation_steps = 4 |
|
|