|
--- |
|
base_model: alignment-handbook/zephyr-7b-sft-full |
|
datasets: |
|
- generation/UF |
|
library_name: peft |
|
license: apache-2.0 |
|
tags: |
|
- alignment-handbook |
|
- trl |
|
- dpo |
|
- generated_from_trainer |
|
model-index: |
|
- name: zephyr-dpop-qlora-uf-ours-5e-6 |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# zephyr-dpop-qlora-uf-ours-5e-6 |
|
|
|
This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the generation/UF dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 5.1264 |
|
- Positive Losses: 43.1884 |
|
- Dpo Losses: 0.6101 |
|
- Rewards/chosen: -0.3903 |
|
- Rewards/rejected: -0.7274 |
|
- Rewards/accuracies: 0.6670 |
|
- Rewards/margins: 0.3370 |
|
- Rewards/margins Max: 1.4167 |
|
- Rewards/margins Min: -0.8378 |
|
- Rewards/margins Std: 0.7707 |
|
- Logps/rejected: -331.3143 |
|
- Logps/chosen: -323.6263 |
|
- Logits/rejected: -2.4808 |
|
- Logits/chosen: -2.5277 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 5e-06 |
|
- train_batch_size: 4 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- distributed_type: multi-GPU |
|
- num_devices: 2 |
|
- gradient_accumulation_steps: 2 |
|
- total_train_batch_size: 16 |
|
- total_eval_batch_size: 16 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: cosine |
|
- lr_scheduler_warmup_ratio: 0.1 |
|
- num_epochs: 3 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |
|
|:-------------:|:-----:|:----:|:---------------:|:---------------:|:----------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:| |
|
| 0.6302 | 0.28 | 100 | 0.8170 | 1.2658 | 0.6732 | 0.0877 | 0.0421 | 0.5920 | 0.0456 | 0.2717 | -0.1472 | 0.1389 | -254.3717 | -275.8216 | -2.6655 | -2.7015 | |
|
| 0.5709 | 0.56 | 200 | 2.1527 | 14.1341 | 0.6518 | -0.0932 | -0.2071 | 0.6360 | 0.1139 | 0.6302 | -0.3647 | 0.3297 | -279.2877 | -293.9101 | -2.6591 | -2.6989 | |
|
| 0.4758 | 0.85 | 300 | 2.2508 | 15.0103 | 0.6396 | -0.0829 | -0.2324 | 0.6590 | 0.1495 | 0.7147 | -0.4138 | 0.3813 | -281.8231 | -292.8875 | -2.6866 | -2.7294 | |
|
| 0.4857 | 1.13 | 400 | 2.8413 | 20.4422 | 0.6295 | -0.1464 | -0.3473 | 0.6540 | 0.2010 | 0.9605 | -0.5524 | 0.5026 | -293.3139 | -299.2286 | -2.5810 | -2.6240 | |
|
| 0.6015 | 1.41 | 500 | 2.4297 | 16.2472 | 0.6215 | -0.0798 | -0.3011 | 0.6660 | 0.2213 | 0.9834 | -0.5416 | 0.5125 | -288.6871 | -292.5703 | -2.5803 | -2.6246 | |
|
| 0.4849 | 1.69 | 600 | 3.8077 | 30.0769 | 0.6153 | -0.2435 | -0.5155 | 0.6630 | 0.2721 | 1.1651 | -0.6779 | 0.6337 | -310.1338 | -308.9421 | -2.5659 | -2.6120 | |
|
| 0.4012 | 1.97 | 700 | 4.4359 | 36.7814 | 0.6160 | -0.3161 | -0.6003 | 0.6660 | 0.2841 | 1.2285 | -0.7320 | 0.6759 | -318.6039 | -316.2043 | -2.5208 | -2.5672 | |
|
| 0.3245 | 2.25 | 800 | 4.9873 | 41.8073 | 0.6123 | -0.3752 | -0.6988 | 0.6660 | 0.3236 | 1.3768 | -0.8214 | 0.7506 | -328.4567 | -322.1156 | -2.4952 | -2.5421 | |
|
| 0.3018 | 2.54 | 900 | 5.0342 | 42.1224 | 0.6084 | -0.3810 | -0.7194 | 0.6680 | 0.3383 | 1.4141 | -0.8336 | 0.7645 | -330.5147 | -322.6951 | -2.4804 | -2.5276 | |
|
| 0.4364 | 2.82 | 1000 | 5.0975 | 42.8746 | 0.6098 | -0.3872 | -0.7242 | 0.6680 | 0.3370 | 1.4157 | -0.8369 | 0.7695 | -331.0000 | -323.3101 | -2.4816 | -2.5285 | |
|
|
|
|
|
### Framework versions |
|
|
|
- PEFT 0.7.1 |
|
- Transformers 4.39.0.dev0 |
|
- Pytorch 2.1.2+cu121 |
|
- Datasets 2.14.6 |
|
- Tokenizers 0.15.2 |