Edit model card

zephyr-7b-dpo-uffull-qlora-5e-7

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5924
  • Rewards/chosen: -0.2516
  • Rewards/rejected: -0.6013
  • Rewards/accuracies: 0.7321
  • Rewards/margins: 0.3497
  • Rewards/margins Max: 1.2300
  • Rewards/margins Min: -0.5547
  • Rewards/margins Std: 0.6038
  • Logps/rejected: -322.2831
  • Logps/chosen: -309.6581
  • Logits/rejected: -2.6832
  • Logits/chosen: -2.7155

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 16
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6929 0.03 100 0.6930 0.0001 -0.0003 0.5377 0.0004 0.0054 -0.0041 0.0032 -262.1841 -284.4886 -2.7819 -2.8200
0.6922 0.05 200 0.6923 0.0008 -0.0010 0.6627 0.0019 0.0100 -0.0058 0.0051 -262.2543 -284.4120 -2.7814 -2.8195
0.6908 0.08 300 0.6903 0.0041 -0.0025 0.7143 0.0066 0.0281 -0.0141 0.0137 -262.3995 -284.0884 -2.7806 -2.8185
0.689 0.1 400 0.6870 0.0093 -0.0046 0.7183 0.0140 0.0586 -0.0282 0.0285 -262.6125 -283.5621 -2.7783 -2.8162
0.6813 0.13 500 0.6813 0.0235 -0.0040 0.7242 0.0275 0.1137 -0.0534 0.0551 -262.5450 -282.1426 -2.7758 -2.8132
0.6712 0.16 600 0.6742 0.0200 -0.0247 0.7262 0.0447 0.1814 -0.0859 0.0884 -264.6151 -282.4901 -2.7638 -2.8015
0.6643 0.18 700 0.6653 0.0004 -0.0668 0.7242 0.0672 0.2707 -0.1305 0.1329 -268.8295 -284.4591 -2.7558 -2.7925
0.6421 0.21 800 0.6562 -0.0231 -0.1154 0.7222 0.0923 0.3706 -0.1761 0.1820 -273.6847 -286.8017 -2.7519 -2.7880
0.648 0.24 900 0.6480 -0.0748 -0.1938 0.7183 0.1190 0.4823 -0.2242 0.2359 -281.5314 -291.9791 -2.7477 -2.7835
0.6547 0.26 1000 0.6378 -0.0763 -0.2278 0.7183 0.1515 0.5995 -0.2816 0.2954 -284.9341 -292.1262 -2.7446 -2.7798
0.6408 0.29 1100 0.6317 -0.0432 -0.2136 0.7262 0.1704 0.6414 -0.2953 0.3163 -283.5132 -288.8173 -2.7545 -2.7885
0.6358 0.31 1200 0.6260 -0.0529 -0.2480 0.7183 0.1952 0.7219 -0.3249 0.3520 -286.9514 -289.7809 -2.7585 -2.7914
0.6297 0.34 1300 0.6215 -0.1213 -0.3378 0.7143 0.2165 0.8114 -0.3727 0.4028 -295.9312 -296.6275 -2.7489 -2.7816
0.6165 0.37 1400 0.6213 -0.2177 -0.4420 0.7103 0.2243 0.8626 -0.4022 0.4264 -306.3474 -306.2648 -2.7404 -2.7733
0.6185 0.39 1500 0.6162 -0.1021 -0.3356 0.7063 0.2335 0.8779 -0.3976 0.4349 -295.7101 -294.7082 -2.7425 -2.7745
0.6066 0.42 1600 0.6141 -0.1696 -0.4256 0.7123 0.2560 0.9394 -0.4398 0.4678 -304.7078 -301.4554 -2.7367 -2.7689
0.6048 0.44 1700 0.6123 -0.1220 -0.3748 0.7123 0.2529 0.9411 -0.4235 0.4656 -299.6321 -296.6920 -2.7315 -2.7638
0.609 0.47 1800 0.6090 -0.1424 -0.4122 0.7282 0.2698 0.9829 -0.4478 0.4813 -303.3703 -298.7344 -2.7251 -2.7574
0.5909 0.5 1900 0.6062 -0.2373 -0.5239 0.7183 0.2866 1.0475 -0.4860 0.5181 -314.5422 -308.2264 -2.7186 -2.7507
0.6011 0.52 2000 0.6048 -0.1288 -0.4109 0.7242 0.2821 1.0037 -0.4627 0.4932 -303.2409 -297.3789 -2.7100 -2.7425
0.6047 0.55 2100 0.6031 -0.1486 -0.4420 0.7262 0.2934 1.0559 -0.4792 0.5193 -306.3505 -299.3512 -2.7123 -2.7448
0.592 0.58 2200 0.6011 -0.2623 -0.5777 0.7242 0.3154 1.1326 -0.5284 0.5638 -319.9217 -310.7270 -2.7100 -2.7423
0.6285 0.6 2300 0.6022 -0.3099 -0.6207 0.7242 0.3108 1.1254 -0.5181 0.5570 -324.2166 -315.4819 -2.7044 -2.7370
0.6258 0.63 2400 0.6005 -0.1642 -0.4737 0.7302 0.3095 1.0716 -0.4957 0.5259 -309.5165 -300.9170 -2.6960 -2.7291
0.5855 0.65 2500 0.5981 -0.2145 -0.5381 0.7341 0.3237 1.1337 -0.5235 0.5568 -315.9617 -305.9418 -2.6924 -2.7253
0.6095 0.68 2600 0.5970 -0.2416 -0.5724 0.7262 0.3308 1.1753 -0.5364 0.5756 -319.3885 -308.6579 -2.6859 -2.7187
0.6013 0.71 2700 0.5961 -0.2450 -0.5789 0.7262 0.3340 1.1924 -0.5460 0.5830 -320.0433 -308.9903 -2.6845 -2.7170
0.6233 0.73 2800 0.5954 -0.2426 -0.5787 0.7302 0.3361 1.2015 -0.5491 0.5882 -320.0177 -308.7550 -2.6852 -2.7174
0.6119 0.76 2900 0.5944 -0.2613 -0.6032 0.7282 0.3419 1.2206 -0.5595 0.6006 -322.4701 -310.6289 -2.6853 -2.7176
0.5644 0.79 3000 0.5938 -0.2218 -0.5648 0.7282 0.3430 1.1989 -0.5312 0.5872 -318.6263 -306.6716 -2.6826 -2.7150
0.5946 0.81 3100 0.5932 -0.2763 -0.6239 0.7262 0.3476 1.2359 -0.5639 0.6094 -324.5376 -312.1256 -2.6762 -2.7090
0.5961 0.84 3200 0.5930 -0.2713 -0.6200 0.7262 0.3487 1.2365 -0.5595 0.6090 -324.1454 -311.6203 -2.6815 -2.7140
0.5841 0.86 3300 0.5927 -0.2686 -0.6177 0.7302 0.3491 1.2362 -0.5602 0.6093 -323.9175 -311.3521 -2.6834 -2.7157
0.611 0.89 3400 0.5925 -0.2485 -0.5979 0.7361 0.3493 1.2281 -0.5496 0.6023 -321.9356 -309.3477 -2.6821 -2.7145
0.5458 0.92 3500 0.5925 -0.2494 -0.5988 0.7341 0.3494 1.2280 -0.5516 0.6025 -322.0256 -309.4359 -2.6792 -2.7118
0.5926 0.94 3600 0.5925 -0.2520 -0.6014 0.7321 0.3494 1.2312 -0.5539 0.6042 -322.2860 -309.6909 -2.6837 -2.7160
0.6096 0.97 3700 0.5926 -0.2517 -0.6015 0.7341 0.3497 1.2313 -0.5539 0.6042 -322.2966 -309.6683 -2.6793 -2.7119
0.5865 0.99 3800 0.5925 -0.2517 -0.6019 0.7341 0.3502 1.2316 -0.5546 0.6038 -322.3433 -309.6684 -2.6801 -2.7126

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/zephyr-7b-dpo-uffull-qlora-5e-7

Adapter
(136)
this model

Dataset used to train just1nseo/zephyr-7b-dpo-uffull-qlora-5e-7