Alex Shafranovich
Upload folder using huggingface_hub
6070cba verified
metadata
library_name: transformers
license: apache-2.0
base_model: HuggingFaceTB/SmolLM-1.7B-Instruct
tags:
  - alignment-handbook
  - generated_from_trainer
datasets:
  - BAAI/Infinity-Preference
model-index:
  - name: smollm-1.7b-instruct-simpo-v2
    results: []

smollm-1.7b-instruct-simpo-v2

This model is a fine-tuned version of HuggingFaceTB/SmolLM-1.7B-Instruct on the BAAI/Infinity-Preference dataset. It achieves the following results on the evaluation set:

  • Loss: 3.0877
  • Rewards/chosen: -22.8949
  • Rewards/rejected: -24.4444
  • Rewards/accuracies: 0.6300
  • Rewards/margins: 1.5495
  • Logps/rejected: -2.4444
  • Logps/chosen: -2.2895
  • Logits/rejected: -2.4913
  • Logits/chosen: -2.3131

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
3.2871 0.0135 400 3.4379 -16.5537 -16.5135 0.4700 -0.0402 -1.6513 -1.6554 -0.7019 -0.7007
3.4746 0.0270 800 3.4370 -16.5561 -16.5146 0.4700 -0.0415 -1.6515 -1.6556 -0.7002 -0.6988
2.8856 0.0404 1200 3.4399 -16.5623 -16.5160 0.4700 -0.0464 -1.6516 -1.6562 -0.6997 -0.6984
3.8819 0.0539 1600 3.4374 -16.5639 -16.5248 0.4700 -0.0391 -1.6525 -1.6564 -0.7012 -0.6998
3.622 0.0674 2000 3.4319 -16.5838 -16.5551 0.4700 -0.0288 -1.6555 -1.6584 -0.7089 -0.7069
3.6924 0.0809 2400 3.4273 -16.6109 -16.5901 0.4700 -0.0208 -1.6590 -1.6611 -0.7032 -0.7007
3.0591 0.0944 2800 3.4161 -16.6863 -16.6979 0.4600 0.0117 -1.6698 -1.6686 -0.7295 -0.7253
3.4937 0.1079 3200 3.4013 -16.7982 -16.8590 0.4700 0.0608 -1.6859 -1.6798 -0.7483 -0.7412
3.1565 0.1213 3600 3.3852 -16.8542 -16.9385 0.4700 0.0843 -1.6939 -1.6854 -0.7618 -0.7526
2.7504 0.1348 4000 3.3711 -16.9128 -17.0175 0.4800 0.1047 -1.7018 -1.6913 -0.7684 -0.7574
3.0312 0.1483 4400 3.3606 -16.9720 -17.0910 0.4900 0.1190 -1.7091 -1.6972 -0.7754 -0.7629
4.145 0.1618 4800 3.3407 -17.0816 -17.2375 0.5100 0.1559 -1.7238 -1.7082 -0.7902 -0.7746
3.9514 0.1753 5200 3.3126 -17.1952 -17.3924 0.5100 0.1972 -1.7392 -1.7195 -0.8201 -0.8001
2.4942 0.1887 5600 3.2864 -17.2731 -17.4955 0.5100 0.2223 -1.7495 -1.7273 -0.8187 -0.7960
2.6757 0.2022 6000 3.2615 -17.3603 -17.6063 0.5200 0.2460 -1.7606 -1.7360 -0.7977 -0.7735
2.8576 0.2157 6400 3.2382 -17.5060 -17.8132 0.5500 0.3072 -1.7813 -1.7506 -0.8562 -0.8260
3.7483 0.2292 6800 3.2140 -17.5965 -17.9376 0.5700 0.3411 -1.7938 -1.7596 -0.8751 -0.8407
3.5349 0.2427 7200 3.2035 -17.6663 -18.0193 0.5800 0.3530 -1.8019 -1.7666 -0.8780 -0.8417
2.0604 0.2562 7600 3.1925 -17.7393 -18.1045 0.6100 0.3652 -1.8104 -1.7739 -0.9017 -0.8602
5.7031 0.2696 8000 3.1672 -18.0175 -18.4936 0.6100 0.4760 -1.8494 -1.8018 -0.9982 -0.9467
2.6005 0.2831 8400 3.1475 -18.1162 -18.6283 0.6100 0.5121 -1.8628 -1.8116 -1.0732 -1.0161
1.9787 0.2966 8800 3.1226 -18.3260 -18.9198 0.6100 0.5938 -1.8920 -1.8326 -1.1691 -1.1062
2.8347 0.3101 9200 3.1156 -18.4632 -19.0934 0.6100 0.6301 -1.9093 -1.8463 -1.2592 -1.1910
2.701 0.3236 9600 3.1022 -18.5083 -19.1346 0.6100 0.6264 -1.9135 -1.8508 -1.2785 -1.2073
3.772 0.3371 10000 3.0772 -18.5843 -19.2491 0.6100 0.6649 -1.9249 -1.8584 -1.3345 -1.2587
2.7414 0.3505 10400 3.0551 -18.8305 -19.5946 0.6100 0.7641 -1.9595 -1.8830 -1.3824 -1.3004
2.0287 0.3640 10800 3.0534 -18.9934 -19.7985 0.6200 0.8051 -1.9798 -1.8993 -1.4355 -1.3467
1.0473 0.3775 11200 3.0528 -19.1581 -19.9858 0.6100 0.8277 -1.9986 -1.9158 -1.5109 -1.4173
2.8106 0.3910 11600 3.0436 -19.1763 -19.9989 0.6100 0.8226 -1.9999 -1.9176 -1.5138 -1.4206
3.0344 0.4045 12000 3.0333 -19.2526 -20.1079 0.6100 0.8553 -2.0108 -1.9253 -1.5628 -1.4657
2.1886 0.4179 12400 3.0187 -19.4500 -20.3818 0.6300 0.9318 -2.0382 -1.9450 -1.6246 -1.5217
4.1181 0.4314 12800 3.0086 -19.6204 -20.6104 0.6300 0.9900 -2.0610 -1.9620 -1.6886 -1.5818
1.6647 0.4449 13200 3.0126 -19.7773 -20.7949 0.6300 1.0176 -2.0795 -1.9777 -1.7307 -1.6181
4.8533 0.4584 13600 3.0012 -19.9001 -20.9633 0.6300 1.0632 -2.0963 -1.9900 -1.7437 -1.6288
2.9945 0.4719 14000 3.0071 -19.9831 -21.0361 0.6300 1.0529 -2.1036 -1.9983 -1.7839 -1.6667
2.9377 0.4854 14400 2.9946 -20.1165 -21.2172 0.6400 1.1007 -2.1217 -2.0117 -1.8386 -1.7178
2.7856 0.4988 14800 2.9908 -20.2830 -21.4151 0.6300 1.1322 -2.1415 -2.0283 -1.8720 -1.7468
4.9446 0.5123 15200 2.9905 -20.4144 -21.5669 0.6300 1.1525 -2.1567 -2.0414 -1.9057 -1.7760
3.2834 0.5258 15600 2.9858 -20.4428 -21.5993 0.6300 1.1565 -2.1599 -2.0443 -1.8928 -1.7633
1.8705 0.5393 16000 2.9888 -20.5922 -21.7774 0.6300 1.1853 -2.1777 -2.0592 -1.9340 -1.8009
4.0587 0.5528 16400 2.9925 -20.8812 -22.1359 0.6300 1.2547 -2.2136 -2.0881 -2.0019 -1.8627
3.0706 0.5662 16800 2.9946 -21.1005 -22.4176 0.6300 1.3171 -2.2418 -2.1101 -2.0533 -1.9104
3.152 0.5797 17200 2.9916 -21.2937 -22.6723 0.6200 1.3786 -2.2672 -2.1294 -2.1094 -1.9627
1.8856 0.5932 17600 2.9847 -21.2727 -22.6463 0.6200 1.3736 -2.2646 -2.1273 -2.1108 -1.9637
1.1291 0.6067 18000 2.9981 -21.5313 -22.9507 0.6200 1.4194 -2.2951 -2.1531 -2.1736 -2.0212
2.9894 0.6202 18400 3.0033 -21.6191 -23.0276 0.6200 1.4085 -2.3028 -2.1619 -2.2089 -2.0543
3.497 0.6337 18800 3.0252 -21.8198 -23.2426 0.6200 1.4228 -2.3243 -2.1820 -2.2285 -2.0714
3.18 0.6471 19200 3.0307 -21.8887 -23.3005 0.6200 1.4117 -2.3300 -2.1889 -2.2462 -2.0862
1.9522 0.6606 19600 3.0391 -21.9179 -23.3214 0.6300 1.4035 -2.3321 -2.1918 -2.2476 -2.0875
2.4878 0.6741 20000 3.0431 -22.1021 -23.5543 0.6300 1.4522 -2.3554 -2.2102 -2.2969 -2.1333
2.3506 0.6876 20400 3.0453 -22.2379 -23.7220 0.6300 1.4841 -2.3722 -2.2238 -2.3258 -2.1603
3.9719 0.7011 20800 3.0591 -22.2718 -23.7317 0.6300 1.4599 -2.3732 -2.2272 -2.3263 -2.1600
1.4942 0.7146 21200 3.0574 -22.3226 -23.8044 0.6300 1.4819 -2.3804 -2.2323 -2.3352 -2.1680
0.8797 0.7280 21600 3.0616 -22.3419 -23.8235 0.6300 1.4816 -2.3823 -2.2342 -2.3394 -2.1721
2.8176 0.7415 22000 3.0751 -22.4788 -23.9643 0.6300 1.4855 -2.3964 -2.2479 -2.3767 -2.2073
3.3744 0.7550 22400 3.0775 -22.6028 -24.1137 0.6300 1.5109 -2.4114 -2.2603 -2.4146 -2.2423
1.9708 0.7685 22800 3.0768 -22.6249 -24.1479 0.6300 1.5231 -2.4148 -2.2625 -2.4216 -2.2482
2.1589 0.7820 23200 3.0697 -22.6570 -24.1936 0.6300 1.5367 -2.4194 -2.2657 -2.4323 -2.2591
3.0872 0.7954 23600 3.0813 -22.7174 -24.2489 0.6300 1.5315 -2.4249 -2.2717 -2.4430 -2.2683
3.9705 0.8089 24000 3.0806 -22.7644 -24.3076 0.6300 1.5432 -2.4308 -2.2764 -2.4598 -2.2840
3.5691 0.8224 24400 3.0807 -22.7627 -24.2931 0.6300 1.5304 -2.4293 -2.2763 -2.4621 -2.2857
1.4467 0.8359 24800 3.0854 -22.8132 -24.3525 0.6300 1.5393 -2.4353 -2.2813 -2.4742 -2.2963
2.7241 0.8494 25200 3.0862 -22.8300 -24.3745 0.6300 1.5445 -2.4375 -2.2830 -2.4770 -2.2988
2.7441 0.8629 25600 3.0866 -22.8450 -24.3876 0.6300 1.5427 -2.4388 -2.2845 -2.4823 -2.3048
1.4801 0.8763 26000 3.0839 -22.8522 -24.4010 0.6300 1.5488 -2.4401 -2.2852 -2.4827 -2.3057
2.5965 0.8898 26400 3.0841 -22.8629 -24.4169 0.6300 1.5540 -2.4417 -2.2863 -2.4877 -2.3095
3.6415 0.9033 26800 3.0893 -22.8830 -24.4340 0.6300 1.5510 -2.4434 -2.2883 -2.4894 -2.3114
2.0584 0.9168 27200 3.0894 -22.8879 -24.4268 0.6300 1.5389 -2.4427 -2.2888 -2.4917 -2.3134
2.5068 0.9303 27600 3.0896 -22.8936 -24.4408 0.6300 1.5472 -2.4441 -2.2894 -2.4922 -2.3134
0.677 0.9437 28000 3.0835 -22.8876 -24.4472 0.6300 1.5596 -2.4447 -2.2888 -2.4919 -2.3134
2.5931 0.9572 28400 3.0875 -22.8938 -24.4419 0.6300 1.5481 -2.4442 -2.2894 -2.4907 -2.3117
4.4413 0.9707 28800 3.0893 -22.8952 -24.4383 0.6300 1.5431 -2.4438 -2.2895 -2.4914 -2.3131
2.7584 0.9842 29200 3.0874 -22.8946 -24.4410 0.6300 1.5464 -2.4441 -2.2895 -2.4894 -2.3112
4.4406 0.9977 29600 3.0877 -22.8949 -24.4444 0.6300 1.5495 -2.4444 -2.2895 -2.4913 -2.3131

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.2.2
  • Datasets 3.0.1
  • Tokenizers 0.20.0