Edit model card

zephyr-7b-dpo-qlora

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4863
  • Rewards/chosen: -2.8122
  • Rewards/rejected: -3.9101
  • Rewards/accuracies: 0.7395
  • Rewards/margins: 1.0979
  • Logps/rejected: -635.6185
  • Logps/chosen: -545.8760
  • Logits/rejected: -1.1318
  • Logits/chosen: -1.2525

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6821 0.03 100 0.6821 0.0498 0.0267 0.6565 0.0231 -241.9392 -259.6706 -1.9557 -2.0951
0.6496 0.05 200 0.6487 -0.0543 -0.1608 0.6810 0.1065 -260.6906 -270.0797 -1.9313 -2.0680
0.6042 0.08 300 0.6216 -0.3050 -0.5140 0.6730 0.2090 -296.0115 -295.1514 -1.8895 -2.0229
0.6218 0.1 400 0.5940 -0.6189 -0.9584 0.6810 0.3395 -340.4455 -326.5407 -1.8155 -1.9431
0.5674 0.13 500 0.5780 -1.5729 -2.0527 0.7040 0.4797 -449.8770 -421.9457 -1.6637 -1.7893
0.5632 0.16 600 0.5649 -0.7810 -1.2808 0.7040 0.4999 -372.6913 -342.7494 -1.6489 -1.7786
0.5331 0.18 700 0.5607 -1.9088 -2.6807 0.7060 0.7719 -512.6751 -455.5275 -1.4691 -1.5919
0.4996 0.21 800 0.5433 -1.4500 -2.1596 0.7070 0.7096 -460.5685 -409.6544 -1.5461 -1.6710
0.514 0.24 900 0.5440 -1.2657 -1.9170 0.7190 0.6512 -436.3041 -391.2230 -1.5014 -1.6214
0.5468 0.26 1000 0.5418 -1.3702 -2.0703 0.7175 0.7001 -451.6408 -401.6767 -1.4449 -1.5656
0.569 0.29 1100 0.5299 -1.1397 -1.8623 0.7210 0.7227 -430.8414 -378.6177 -1.4278 -1.5524
0.5732 0.31 1200 0.5185 -1.1057 -1.8287 0.7250 0.7231 -427.4810 -375.2183 -1.3596 -1.4804
0.5332 0.34 1300 0.5315 -2.1367 -3.0509 0.7240 0.9142 -549.7025 -478.3255 -1.1977 -1.3072
0.5431 0.37 1400 0.5211 -1.2563 -2.0974 0.7260 0.8411 -454.3522 -390.2846 -1.3130 -1.4314
0.4862 0.39 1500 0.5162 -1.3677 -2.2741 0.7355 0.9063 -472.0146 -401.4262 -1.2795 -1.4015
0.5858 0.42 1600 0.5073 -1.8100 -2.6996 0.7365 0.8896 -514.5671 -445.6515 -1.1534 -1.2718
0.5147 0.44 1700 0.5000 -2.2681 -3.2167 0.7340 0.9486 -566.2829 -491.4621 -1.1468 -1.2691
0.4809 0.47 1800 0.5022 -2.9278 -3.9903 0.7405 1.0625 -643.6409 -557.4312 -1.0617 -1.1786
0.46 0.5 1900 0.5003 -2.4333 -3.5014 0.7355 1.0681 -594.7523 -507.9823 -1.1041 -1.2253
0.477 0.52 2000 0.4989 -2.3912 -3.3897 0.7345 0.9985 -583.5771 -503.7692 -1.1185 -1.2392
0.5068 0.55 2100 0.4939 -2.4778 -3.4672 0.7430 0.9894 -591.3240 -512.4297 -1.1255 -1.2462
0.4832 0.58 2200 0.4925 -2.1250 -3.0518 0.7425 0.9268 -549.7868 -477.1522 -1.1670 -1.2899
0.4731 0.6 2300 0.4923 -2.8792 -4.0084 0.7435 1.1291 -645.4448 -552.5742 -1.0953 -1.2155
0.4782 0.63 2400 0.4923 -2.8503 -3.9248 0.7420 1.0745 -637.0914 -549.6804 -1.0794 -1.1978
0.4983 0.65 2500 0.4906 -2.5713 -3.6558 0.7410 1.0845 -610.1890 -521.7778 -1.1292 -1.2522
0.4746 0.68 2600 0.4947 -2.5857 -3.7233 0.7365 1.1375 -616.9340 -523.2234 -1.1267 -1.2491
0.514 0.71 2700 0.4924 -2.6975 -3.8049 0.7355 1.1074 -625.0958 -534.3994 -1.1248 -1.2463
0.4662 0.73 2800 0.4899 -2.8300 -3.9668 0.7380 1.1368 -641.2913 -547.6557 -1.1134 -1.2345
0.5111 0.76 2900 0.4873 -2.9392 -4.0635 0.7405 1.1244 -650.9627 -558.5706 -1.1188 -1.2396
0.4758 0.79 3000 0.4866 -2.8621 -3.9416 0.7410 1.0795 -638.7724 -550.8655 -1.1318 -1.2526
0.4908 0.81 3100 0.4869 -2.8503 -3.9411 0.7420 1.0908 -638.7193 -549.6837 -1.1347 -1.2555
0.4641 0.84 3200 0.4866 -2.8111 -3.8990 0.7405 1.0878 -634.5079 -545.7666 -1.1347 -1.2554
0.5096 0.86 3300 0.4864 -2.7992 -3.8880 0.7395 1.0887 -633.4041 -544.5740 -1.1379 -1.2586
0.455 0.89 3400 0.4866 -2.8126 -3.9082 0.7395 1.0956 -635.4322 -545.9153 -1.1336 -1.2544
0.5262 0.92 3500 0.4864 -2.8110 -3.9081 0.7410 1.0971 -635.4207 -545.7535 -1.1342 -1.2550
0.466 0.94 3600 0.4866 -2.8133 -3.9106 0.7400 1.0973 -635.6727 -545.9836 -1.1347 -1.2555
0.4945 0.97 3700 0.4864 -2.8101 -3.9080 0.7400 1.0979 -635.4124 -545.6666 -1.1321 -1.2528
0.5013 0.99 3800 0.4864 -2.8126 -3.9101 0.7395 1.0975 -635.6184 -545.9131 -1.1317 -1.2524

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
6
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for objects76/zephyr-7b-dpo-qlora

Adapter
(1170)
this model

Dataset used to train objects76/zephyr-7b-dpo-qlora