metadata
base_model: alignment-handbook/zephyr-7b-sft-full
datasets:
- generation/UF
library_name: peft
license: apache-2.0
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
model-index:
- name: zephyr-dpop-qlora-uf-ours-5e-7
results: []
zephyr-dpop-qlora-uf-ours-5e-7
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF dataset. It achieves the following results on the evaluation set:
- Loss: 0.9543
- Positive Losses: 2.5736
- Dpo Losses: 0.6658
- Rewards/chosen: 0.0602
- Rewards/rejected: -0.0038
- Rewards/accuracies: 0.6300
- Rewards/margins: 0.0640
- Rewards/margins Max: 0.3473
- Rewards/margins Min: -0.1824
- Rewards/margins Std: 0.1766
- Logps/rejected: -258.9606
- Logps/chosen: -278.5768
- Logits/rejected: -2.6741
- Logits/chosen: -2.7121
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.6884 | 0.28 | 100 | 0.6931 | 0.0111 | 0.6918 | 0.0136 | 0.0108 | 0.6080 | 0.0028 | 0.0179 | -0.0107 | 0.0094 | -257.4949 | -283.2318 | -2.7651 | -2.8042 |
0.6627 | 0.56 | 200 | 0.6995 | 0.1223 | 0.6858 | 0.0465 | 0.0311 | 0.5960 | 0.0153 | 0.0899 | -0.0496 | 0.0465 | -255.4640 | -279.9481 | -2.7485 | -2.7871 |
0.6293 | 0.85 | 300 | 0.7193 | 0.3552 | 0.6803 | 0.0675 | 0.0398 | 0.5960 | 0.0278 | 0.1601 | -0.0863 | 0.0826 | -254.6033 | -277.8385 | -2.7306 | -2.7684 |
0.6236 | 1.13 | 400 | 0.7519 | 0.6894 | 0.6756 | 0.0800 | 0.0412 | 0.6090 | 0.0388 | 0.2182 | -0.1140 | 0.1113 | -254.4585 | -276.5968 | -2.7119 | -2.7494 |
0.6009 | 1.41 | 500 | 0.8434 | 1.5495 | 0.6718 | 0.0639 | 0.0154 | 0.6090 | 0.0484 | 0.2709 | -0.1440 | 0.1389 | -257.0343 | -278.2061 | -2.6920 | -2.7295 |
0.6136 | 1.69 | 600 | 0.8727 | 1.8302 | 0.6691 | 0.0687 | 0.0134 | 0.6130 | 0.0553 | 0.3049 | -0.1595 | 0.1553 | -257.2360 | -277.7244 | -2.6827 | -2.7203 |
0.5918 | 1.97 | 700 | 0.8998 | 2.0811 | 0.6677 | 0.0671 | 0.0081 | 0.6220 | 0.0591 | 0.3231 | -0.1685 | 0.1641 | -257.7734 | -277.8808 | -2.6797 | -2.7172 |
0.5636 | 2.25 | 800 | 0.9371 | 2.4201 | 0.6667 | 0.0611 | -0.0007 | 0.6260 | 0.0618 | 0.3370 | -0.1777 | 0.1716 | -258.6473 | -278.4820 | -2.6734 | -2.7116 |
0.5736 | 2.54 | 900 | 0.9591 | 2.6268 | 0.6659 | 0.0578 | -0.0060 | 0.6320 | 0.0639 | 0.3467 | -0.1823 | 0.1764 | -259.1817 | -278.8090 | -2.6726 | -2.7107 |
0.5825 | 2.82 | 1000 | 0.9543 | 2.5810 | 0.6658 | 0.0598 | -0.0042 | 0.6290 | 0.0640 | 0.3475 | -0.1826 | 0.1767 | -259.0028 | -278.6134 | -2.6749 | -2.7127 |
Framework versions
- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2