Edit model card

zephyr-dpop-qlora-uf-5e-7

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6789
  • Positive Losses: 0.2624
  • Dpo Losses: 0.6395
  • Rewards/chosen: 0.2321
  • Rewards/rejected: 0.1098
  • Rewards/accuracies: 0.7180
  • Rewards/margins: 0.1223
  • Rewards/margins Max: 0.4485
  • Rewards/margins Min: -0.1548
  • Rewards/margins Std: 0.2025
  • Logps/rejected: -247.6011
  • Logps/chosen: -261.3811
  • Logits/rejected: -2.6202
  • Logits/chosen: -2.6543

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Positive Losses Dpo Losses Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6911 0.03 100 0.6918 0.0097 0.6901 0.0256 0.0195 0.6670 0.0061 0.0282 -0.0136 0.0138 -256.6321 -282.0331 -2.7662 -2.8054
0.6847 0.05 200 0.6919 0.0310 0.6806 0.0844 0.0583 0.6710 0.0261 0.1132 -0.0512 0.0542 -252.7455 -276.1540 -2.7592 -2.7987
0.686 0.08 300 0.6901 0.0841 0.6693 0.1465 0.0956 0.6950 0.0509 0.2071 -0.0915 0.0989 -249.0196 -269.9467 -2.7474 -2.7859
0.6944 0.1 400 0.6911 0.1510 0.6631 0.1581 0.0931 0.7100 0.0650 0.2490 -0.1113 0.1195 -249.2730 -268.7827 -2.7115 -2.7504
0.6923 0.13 500 0.6788 0.0596 0.6647 0.1948 0.1332 0.6950 0.0617 0.2513 -0.1077 0.1190 -245.2602 -265.1090 -2.6843 -2.7243
0.663 0.16 600 0.6892 0.1483 0.6607 0.1942 0.1226 0.6770 0.0716 0.3008 -0.1286 0.1420 -246.3230 -265.1740 -2.6660 -2.7036
0.6784 0.18 700 0.6935 0.2142 0.6550 0.1892 0.1049 0.6970 0.0843 0.3275 -0.1274 0.1516 -248.0892 -265.6756 -2.6229 -2.6624
0.661 0.21 800 0.6885 0.1770 0.6538 0.1994 0.1122 0.7020 0.0872 0.3388 -0.1292 0.1549 -247.3548 -264.6508 -2.6850 -2.7245
0.6736 0.24 900 0.6827 0.1576 0.6557 0.2025 0.1192 0.6940 0.0833 0.3345 -0.1335 0.1561 -246.6593 -264.3388 -2.6814 -2.7201
0.6998 0.26 1000 0.6806 0.2131 0.6517 0.2037 0.1115 0.7070 0.0922 0.3499 -0.1335 0.1615 -247.4245 -264.2192 -2.6830 -2.7190
0.6943 0.29 1100 0.6808 0.2125 0.6503 0.2101 0.1144 0.7100 0.0957 0.3629 -0.1371 0.1674 -247.1344 -263.5789 -2.6633 -2.6979
0.6761 0.31 1200 0.6793 0.1898 0.6511 0.2157 0.1215 0.7110 0.0942 0.3704 -0.1366 0.1692 -246.4255 -263.0201 -2.6573 -2.6916
0.6976 0.34 1300 0.6730 0.1194 0.6535 0.2178 0.1297 0.7080 0.0881 0.3434 -0.1322 0.1594 -245.6055 -262.8122 -2.6282 -2.6641
0.7536 0.37 1400 0.7005 0.3143 0.6471 0.2121 0.1083 0.7030 0.1038 0.3986 -0.1530 0.1838 -247.7509 -263.3833 -2.6211 -2.6572
0.6711 0.39 1500 0.6918 0.2213 0.6489 0.2190 0.1197 0.7040 0.0994 0.3826 -0.1451 0.1760 -246.6128 -262.6917 -2.5983 -2.6356
0.7428 0.42 1600 0.6867 0.1652 0.6501 0.2193 0.1228 0.7010 0.0965 0.3730 -0.1448 0.1730 -246.2957 -262.6611 -2.5979 -2.6328
0.6593 0.44 1700 0.6785 0.2228 0.6467 0.2221 0.1173 0.7110 0.1048 0.3978 -0.1483 0.1825 -246.8526 -262.3859 -2.6262 -2.6614
0.6856 0.47 1800 0.6702 0.1343 0.6504 0.2318 0.1356 0.6980 0.0962 0.3760 -0.1454 0.1748 -245.0162 -261.4142 -2.5972 -2.6326
0.6552 0.5 1900 0.6743 0.1855 0.6484 0.2278 0.1267 0.6990 0.1011 0.3920 -0.1494 0.1816 -245.9063 -261.8096 -2.5761 -2.6118
0.6577 0.52 2000 0.6748 0.2036 0.6461 0.2310 0.1247 0.7090 0.1063 0.4016 -0.1526 0.1853 -246.1064 -261.4890 -2.5869 -2.6241
0.6695 0.55 2100 0.6841 0.2842 0.6443 0.2230 0.1124 0.7100 0.1106 0.4202 -0.1537 0.1915 -247.3420 -262.2980 -2.6033 -2.6404
0.6633 0.58 2200 0.6799 0.2580 0.6435 0.2273 0.1147 0.7140 0.1126 0.4254 -0.1549 0.1932 -247.1040 -261.8589 -2.6014 -2.6383
0.7136 0.6 2300 0.6781 0.2376 0.6443 0.2290 0.1183 0.7110 0.1107 0.4197 -0.1532 0.1914 -246.7446 -261.6907 -2.6118 -2.6471
0.6631 0.63 2400 0.6769 0.2289 0.6450 0.2285 0.1195 0.7080 0.1090 0.4134 -0.1509 0.1882 -246.6301 -261.7479 -2.6072 -2.6430
0.6884 0.65 2500 0.6854 0.3215 0.6404 0.2248 0.1047 0.7120 0.1201 0.4408 -0.1583 0.2000 -248.1103 -262.1167 -2.6064 -2.6413
0.6701 0.68 2600 0.6817 0.2661 0.6432 0.2290 0.1154 0.7240 0.1136 0.4344 -0.1554 0.1960 -247.0384 -261.6952 -2.6116 -2.6458
0.668 0.71 2700 0.6771 0.2209 0.6441 0.2330 0.1218 0.7190 0.1112 0.4213 -0.1525 0.1911 -246.4004 -261.2966 -2.6196 -2.6533
0.6851 0.73 2800 0.6777 0.2299 0.6430 0.2330 0.1192 0.7090 0.1138 0.4274 -0.1550 0.1946 -246.6621 -261.2937 -2.6278 -2.6613
0.678 0.76 2900 0.6856 0.2997 0.6402 0.2278 0.1072 0.7110 0.1207 0.4462 -0.1603 0.2028 -247.8615 -261.8085 -2.6269 -2.6602
0.6605 0.79 3000 0.6807 0.2415 0.6412 0.2316 0.1134 0.7160 0.1182 0.4380 -0.1547 0.1986 -247.2367 -261.4324 -2.6275 -2.6605
0.6874 0.81 3100 0.6753 0.2061 0.6425 0.2349 0.1199 0.7190 0.1150 0.4300 -0.1520 0.1951 -246.5852 -261.0995 -2.6151 -2.6494
0.6516 0.84 3200 0.6828 0.3006 0.6385 0.2284 0.1036 0.7160 0.1248 0.4527 -0.1586 0.2052 -248.2176 -261.7539 -2.6158 -2.6498
0.6627 0.86 3300 0.6773 0.2406 0.6403 0.2325 0.1123 0.7190 0.1203 0.4419 -0.1545 0.2003 -247.3520 -261.3398 -2.6184 -2.6526
0.6517 0.89 3400 0.6814 0.2865 0.6386 0.2300 0.1056 0.7190 0.1244 0.4519 -0.1569 0.2045 -248.0181 -261.5968 -2.6213 -2.6551
0.7267 0.92 3500 0.6810 0.2880 0.6385 0.2302 0.1056 0.7200 0.1246 0.4536 -0.1569 0.2050 -248.0208 -261.5744 -2.6222 -2.6560
0.6563 0.94 3600 0.6790 0.2627 0.6394 0.2318 0.1093 0.7220 0.1225 0.4487 -0.1550 0.2027 -247.6491 -261.4136 -2.6216 -2.6555
0.7039 0.97 3700 0.6790 0.2634 0.6396 0.2320 0.1099 0.7230 0.1222 0.4483 -0.1550 0.2025 -247.5927 -261.3918 -2.6220 -2.6559
0.6622 0.99 3800 0.6789 0.2612 0.6395 0.2320 0.1098 0.7220 0.1222 0.4482 -0.1549 0.2025 -247.6030 -261.3938 -2.6204 -2.6544

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
2
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/zephyr-dpop-qlora-uf-5e-7

Adapter
(136)
this model

Dataset used to train just1nseo/zephyr-dpop-qlora-uf-5e-7