metadata
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
base_model: alignment-handbook/zephyr-7b-sft-full
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: zephyr-7b-dpo-uffull-qlora-5e-7
results: []
zephyr-7b-dpo-uffull-qlora-5e-7
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.5924
- Rewards/chosen: -0.2516
- Rewards/rejected: -0.6013
- Rewards/accuracies: 0.7321
- Rewards/margins: 0.3497
- Rewards/margins Max: 1.2300
- Rewards/margins Min: -0.5547
- Rewards/margins Std: 0.6038
- Logps/rejected: -322.2831
- Logps/chosen: -309.6581
- Logits/rejected: -2.6832
- Logits/chosen: -2.7155
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 16
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.6929 | 0.03 | 100 | 0.6930 | 0.0001 | -0.0003 | 0.5377 | 0.0004 | 0.0054 | -0.0041 | 0.0032 | -262.1841 | -284.4886 | -2.7819 | -2.8200 |
0.6922 | 0.05 | 200 | 0.6923 | 0.0008 | -0.0010 | 0.6627 | 0.0019 | 0.0100 | -0.0058 | 0.0051 | -262.2543 | -284.4120 | -2.7814 | -2.8195 |
0.6908 | 0.08 | 300 | 0.6903 | 0.0041 | -0.0025 | 0.7143 | 0.0066 | 0.0281 | -0.0141 | 0.0137 | -262.3995 | -284.0884 | -2.7806 | -2.8185 |
0.689 | 0.1 | 400 | 0.6870 | 0.0093 | -0.0046 | 0.7183 | 0.0140 | 0.0586 | -0.0282 | 0.0285 | -262.6125 | -283.5621 | -2.7783 | -2.8162 |
0.6813 | 0.13 | 500 | 0.6813 | 0.0235 | -0.0040 | 0.7242 | 0.0275 | 0.1137 | -0.0534 | 0.0551 | -262.5450 | -282.1426 | -2.7758 | -2.8132 |
0.6712 | 0.16 | 600 | 0.6742 | 0.0200 | -0.0247 | 0.7262 | 0.0447 | 0.1814 | -0.0859 | 0.0884 | -264.6151 | -282.4901 | -2.7638 | -2.8015 |
0.6643 | 0.18 | 700 | 0.6653 | 0.0004 | -0.0668 | 0.7242 | 0.0672 | 0.2707 | -0.1305 | 0.1329 | -268.8295 | -284.4591 | -2.7558 | -2.7925 |
0.6421 | 0.21 | 800 | 0.6562 | -0.0231 | -0.1154 | 0.7222 | 0.0923 | 0.3706 | -0.1761 | 0.1820 | -273.6847 | -286.8017 | -2.7519 | -2.7880 |
0.648 | 0.24 | 900 | 0.6480 | -0.0748 | -0.1938 | 0.7183 | 0.1190 | 0.4823 | -0.2242 | 0.2359 | -281.5314 | -291.9791 | -2.7477 | -2.7835 |
0.6547 | 0.26 | 1000 | 0.6378 | -0.0763 | -0.2278 | 0.7183 | 0.1515 | 0.5995 | -0.2816 | 0.2954 | -284.9341 | -292.1262 | -2.7446 | -2.7798 |
0.6408 | 0.29 | 1100 | 0.6317 | -0.0432 | -0.2136 | 0.7262 | 0.1704 | 0.6414 | -0.2953 | 0.3163 | -283.5132 | -288.8173 | -2.7545 | -2.7885 |
0.6358 | 0.31 | 1200 | 0.6260 | -0.0529 | -0.2480 | 0.7183 | 0.1952 | 0.7219 | -0.3249 | 0.3520 | -286.9514 | -289.7809 | -2.7585 | -2.7914 |
0.6297 | 0.34 | 1300 | 0.6215 | -0.1213 | -0.3378 | 0.7143 | 0.2165 | 0.8114 | -0.3727 | 0.4028 | -295.9312 | -296.6275 | -2.7489 | -2.7816 |
0.6165 | 0.37 | 1400 | 0.6213 | -0.2177 | -0.4420 | 0.7103 | 0.2243 | 0.8626 | -0.4022 | 0.4264 | -306.3474 | -306.2648 | -2.7404 | -2.7733 |
0.6185 | 0.39 | 1500 | 0.6162 | -0.1021 | -0.3356 | 0.7063 | 0.2335 | 0.8779 | -0.3976 | 0.4349 | -295.7101 | -294.7082 | -2.7425 | -2.7745 |
0.6066 | 0.42 | 1600 | 0.6141 | -0.1696 | -0.4256 | 0.7123 | 0.2560 | 0.9394 | -0.4398 | 0.4678 | -304.7078 | -301.4554 | -2.7367 | -2.7689 |
0.6048 | 0.44 | 1700 | 0.6123 | -0.1220 | -0.3748 | 0.7123 | 0.2529 | 0.9411 | -0.4235 | 0.4656 | -299.6321 | -296.6920 | -2.7315 | -2.7638 |
0.609 | 0.47 | 1800 | 0.6090 | -0.1424 | -0.4122 | 0.7282 | 0.2698 | 0.9829 | -0.4478 | 0.4813 | -303.3703 | -298.7344 | -2.7251 | -2.7574 |
0.5909 | 0.5 | 1900 | 0.6062 | -0.2373 | -0.5239 | 0.7183 | 0.2866 | 1.0475 | -0.4860 | 0.5181 | -314.5422 | -308.2264 | -2.7186 | -2.7507 |
0.6011 | 0.52 | 2000 | 0.6048 | -0.1288 | -0.4109 | 0.7242 | 0.2821 | 1.0037 | -0.4627 | 0.4932 | -303.2409 | -297.3789 | -2.7100 | -2.7425 |
0.6047 | 0.55 | 2100 | 0.6031 | -0.1486 | -0.4420 | 0.7262 | 0.2934 | 1.0559 | -0.4792 | 0.5193 | -306.3505 | -299.3512 | -2.7123 | -2.7448 |
0.592 | 0.58 | 2200 | 0.6011 | -0.2623 | -0.5777 | 0.7242 | 0.3154 | 1.1326 | -0.5284 | 0.5638 | -319.9217 | -310.7270 | -2.7100 | -2.7423 |
0.6285 | 0.6 | 2300 | 0.6022 | -0.3099 | -0.6207 | 0.7242 | 0.3108 | 1.1254 | -0.5181 | 0.5570 | -324.2166 | -315.4819 | -2.7044 | -2.7370 |
0.6258 | 0.63 | 2400 | 0.6005 | -0.1642 | -0.4737 | 0.7302 | 0.3095 | 1.0716 | -0.4957 | 0.5259 | -309.5165 | -300.9170 | -2.6960 | -2.7291 |
0.5855 | 0.65 | 2500 | 0.5981 | -0.2145 | -0.5381 | 0.7341 | 0.3237 | 1.1337 | -0.5235 | 0.5568 | -315.9617 | -305.9418 | -2.6924 | -2.7253 |
0.6095 | 0.68 | 2600 | 0.5970 | -0.2416 | -0.5724 | 0.7262 | 0.3308 | 1.1753 | -0.5364 | 0.5756 | -319.3885 | -308.6579 | -2.6859 | -2.7187 |
0.6013 | 0.71 | 2700 | 0.5961 | -0.2450 | -0.5789 | 0.7262 | 0.3340 | 1.1924 | -0.5460 | 0.5830 | -320.0433 | -308.9903 | -2.6845 | -2.7170 |
0.6233 | 0.73 | 2800 | 0.5954 | -0.2426 | -0.5787 | 0.7302 | 0.3361 | 1.2015 | -0.5491 | 0.5882 | -320.0177 | -308.7550 | -2.6852 | -2.7174 |
0.6119 | 0.76 | 2900 | 0.5944 | -0.2613 | -0.6032 | 0.7282 | 0.3419 | 1.2206 | -0.5595 | 0.6006 | -322.4701 | -310.6289 | -2.6853 | -2.7176 |
0.5644 | 0.79 | 3000 | 0.5938 | -0.2218 | -0.5648 | 0.7282 | 0.3430 | 1.1989 | -0.5312 | 0.5872 | -318.6263 | -306.6716 | -2.6826 | -2.7150 |
0.5946 | 0.81 | 3100 | 0.5932 | -0.2763 | -0.6239 | 0.7262 | 0.3476 | 1.2359 | -0.5639 | 0.6094 | -324.5376 | -312.1256 | -2.6762 | -2.7090 |
0.5961 | 0.84 | 3200 | 0.5930 | -0.2713 | -0.6200 | 0.7262 | 0.3487 | 1.2365 | -0.5595 | 0.6090 | -324.1454 | -311.6203 | -2.6815 | -2.7140 |
0.5841 | 0.86 | 3300 | 0.5927 | -0.2686 | -0.6177 | 0.7302 | 0.3491 | 1.2362 | -0.5602 | 0.6093 | -323.9175 | -311.3521 | -2.6834 | -2.7157 |
0.611 | 0.89 | 3400 | 0.5925 | -0.2485 | -0.5979 | 0.7361 | 0.3493 | 1.2281 | -0.5496 | 0.6023 | -321.9356 | -309.3477 | -2.6821 | -2.7145 |
0.5458 | 0.92 | 3500 | 0.5925 | -0.2494 | -0.5988 | 0.7341 | 0.3494 | 1.2280 | -0.5516 | 0.6025 | -322.0256 | -309.4359 | -2.6792 | -2.7118 |
0.5926 | 0.94 | 3600 | 0.5925 | -0.2520 | -0.6014 | 0.7321 | 0.3494 | 1.2312 | -0.5539 | 0.6042 | -322.2860 | -309.6909 | -2.6837 | -2.7160 |
0.6096 | 0.97 | 3700 | 0.5926 | -0.2517 | -0.6015 | 0.7341 | 0.3497 | 1.2313 | -0.5539 | 0.6042 | -322.2966 | -309.6683 | -2.6793 | -2.7119 |
0.5865 | 0.99 | 3800 | 0.5925 | -0.2517 | -0.6019 | 0.7341 | 0.3502 | 1.2316 | -0.5546 | 0.6038 | -322.3433 | -309.6684 | -2.6801 | -2.7126 |
Framework versions
- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2