zephyr-dpop-qlora-uf-ours-5e-6
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF dataset. It achieves the following results on the evaluation set:
- Loss: 5.1264
- Positive Losses: 43.1884
- Dpo Losses: 0.6101
- Rewards/chosen: -0.3903
- Rewards/rejected: -0.7274
- Rewards/accuracies: 0.6670
- Rewards/margins: 0.3370
- Rewards/margins Max: 1.4167
- Rewards/margins Min: -0.8378
- Rewards/margins Std: 0.7707
- Logps/rejected: -331.3143
- Logps/chosen: -323.6263
- Logits/rejected: -2.4808
- Logits/chosen: -2.5277
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.6302 | 0.28 | 100 | 0.8170 | 1.2658 | 0.6732 | 0.0877 | 0.0421 | 0.5920 | 0.0456 | 0.2717 | -0.1472 | 0.1389 | -254.3717 | -275.8216 | -2.6655 | -2.7015 |
0.5709 | 0.56 | 200 | 2.1527 | 14.1341 | 0.6518 | -0.0932 | -0.2071 | 0.6360 | 0.1139 | 0.6302 | -0.3647 | 0.3297 | -279.2877 | -293.9101 | -2.6591 | -2.6989 |
0.4758 | 0.85 | 300 | 2.2508 | 15.0103 | 0.6396 | -0.0829 | -0.2324 | 0.6590 | 0.1495 | 0.7147 | -0.4138 | 0.3813 | -281.8231 | -292.8875 | -2.6866 | -2.7294 |
0.4857 | 1.13 | 400 | 2.8413 | 20.4422 | 0.6295 | -0.1464 | -0.3473 | 0.6540 | 0.2010 | 0.9605 | -0.5524 | 0.5026 | -293.3139 | -299.2286 | -2.5810 | -2.6240 |
0.6015 | 1.41 | 500 | 2.4297 | 16.2472 | 0.6215 | -0.0798 | -0.3011 | 0.6660 | 0.2213 | 0.9834 | -0.5416 | 0.5125 | -288.6871 | -292.5703 | -2.5803 | -2.6246 |
0.4849 | 1.69 | 600 | 3.8077 | 30.0769 | 0.6153 | -0.2435 | -0.5155 | 0.6630 | 0.2721 | 1.1651 | -0.6779 | 0.6337 | -310.1338 | -308.9421 | -2.5659 | -2.6120 |
0.4012 | 1.97 | 700 | 4.4359 | 36.7814 | 0.6160 | -0.3161 | -0.6003 | 0.6660 | 0.2841 | 1.2285 | -0.7320 | 0.6759 | -318.6039 | -316.2043 | -2.5208 | -2.5672 |
0.3245 | 2.25 | 800 | 4.9873 | 41.8073 | 0.6123 | -0.3752 | -0.6988 | 0.6660 | 0.3236 | 1.3768 | -0.8214 | 0.7506 | -328.4567 | -322.1156 | -2.4952 | -2.5421 |
0.3018 | 2.54 | 900 | 5.0342 | 42.1224 | 0.6084 | -0.3810 | -0.7194 | 0.6680 | 0.3383 | 1.4141 | -0.8336 | 0.7645 | -330.5147 | -322.6951 | -2.4804 | -2.5276 |
0.4364 | 2.82 | 1000 | 5.0975 | 42.8746 | 0.6098 | -0.3872 | -0.7242 | 0.6680 | 0.3370 | 1.4157 | -0.8369 | 0.7695 | -331.0000 | -323.3101 | -2.4816 | -2.5285 |
Framework versions
- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2
- Downloads last month
- 0
Model tree for just1nseo/zephyr-dpop-qlora-uf-ours-5e-6
Base model
mistralai/Mistral-7B-v0.1
Finetuned
alignment-handbook/zephyr-7b-sft-full