File size: 16,292 Bytes
6070cba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
library_name: transformers
license: apache-2.0
base_model: HuggingFaceTB/SmolLM-1.7B-Instruct
tags:
- alignment-handbook
- generated_from_trainer
datasets:
- BAAI/Infinity-Preference
model-index:
- name: smollm-1.7b-instruct-simpo-v2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# smollm-1.7b-instruct-simpo-v2

This model is a fine-tuned version of [HuggingFaceTB/SmolLM-1.7B-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-1.7B-Instruct) on the BAAI/Infinity-Preference dataset.
It achieves the following results on the evaluation set:
- Loss: 3.0877
- Rewards/chosen: -22.8949
- Rewards/rejected: -24.4444
- Rewards/accuracies: 0.6300
- Rewards/margins: 1.5495
- Logps/rejected: -2.4444
- Logps/chosen: -2.2895
- Logits/rejected: -2.4913
- Logits/chosen: -2.3131

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step  | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 3.2871        | 0.0135 | 400   | 3.4379          | -16.5537       | -16.5135         | 0.4700             | -0.0402         | -1.6513        | -1.6554      | -0.7019         | -0.7007       |
| 3.4746        | 0.0270 | 800   | 3.4370          | -16.5561       | -16.5146         | 0.4700             | -0.0415         | -1.6515        | -1.6556      | -0.7002         | -0.6988       |
| 2.8856        | 0.0404 | 1200  | 3.4399          | -16.5623       | -16.5160         | 0.4700             | -0.0464         | -1.6516        | -1.6562      | -0.6997         | -0.6984       |
| 3.8819        | 0.0539 | 1600  | 3.4374          | -16.5639       | -16.5248         | 0.4700             | -0.0391         | -1.6525        | -1.6564      | -0.7012         | -0.6998       |
| 3.622         | 0.0674 | 2000  | 3.4319          | -16.5838       | -16.5551         | 0.4700             | -0.0288         | -1.6555        | -1.6584      | -0.7089         | -0.7069       |
| 3.6924        | 0.0809 | 2400  | 3.4273          | -16.6109       | -16.5901         | 0.4700             | -0.0208         | -1.6590        | -1.6611      | -0.7032         | -0.7007       |
| 3.0591        | 0.0944 | 2800  | 3.4161          | -16.6863       | -16.6979         | 0.4600             | 0.0117          | -1.6698        | -1.6686      | -0.7295         | -0.7253       |
| 3.4937        | 0.1079 | 3200  | 3.4013          | -16.7982       | -16.8590         | 0.4700             | 0.0608          | -1.6859        | -1.6798      | -0.7483         | -0.7412       |
| 3.1565        | 0.1213 | 3600  | 3.3852          | -16.8542       | -16.9385         | 0.4700             | 0.0843          | -1.6939        | -1.6854      | -0.7618         | -0.7526       |
| 2.7504        | 0.1348 | 4000  | 3.3711          | -16.9128       | -17.0175         | 0.4800             | 0.1047          | -1.7018        | -1.6913      | -0.7684         | -0.7574       |
| 3.0312        | 0.1483 | 4400  | 3.3606          | -16.9720       | -17.0910         | 0.4900             | 0.1190          | -1.7091        | -1.6972      | -0.7754         | -0.7629       |
| 4.145         | 0.1618 | 4800  | 3.3407          | -17.0816       | -17.2375         | 0.5100             | 0.1559          | -1.7238        | -1.7082      | -0.7902         | -0.7746       |
| 3.9514        | 0.1753 | 5200  | 3.3126          | -17.1952       | -17.3924         | 0.5100             | 0.1972          | -1.7392        | -1.7195      | -0.8201         | -0.8001       |
| 2.4942        | 0.1887 | 5600  | 3.2864          | -17.2731       | -17.4955         | 0.5100             | 0.2223          | -1.7495        | -1.7273      | -0.8187         | -0.7960       |
| 2.6757        | 0.2022 | 6000  | 3.2615          | -17.3603       | -17.6063         | 0.5200             | 0.2460          | -1.7606        | -1.7360      | -0.7977         | -0.7735       |
| 2.8576        | 0.2157 | 6400  | 3.2382          | -17.5060       | -17.8132         | 0.5500             | 0.3072          | -1.7813        | -1.7506      | -0.8562         | -0.8260       |
| 3.7483        | 0.2292 | 6800  | 3.2140          | -17.5965       | -17.9376         | 0.5700             | 0.3411          | -1.7938        | -1.7596      | -0.8751         | -0.8407       |
| 3.5349        | 0.2427 | 7200  | 3.2035          | -17.6663       | -18.0193         | 0.5800             | 0.3530          | -1.8019        | -1.7666      | -0.8780         | -0.8417       |
| 2.0604        | 0.2562 | 7600  | 3.1925          | -17.7393       | -18.1045         | 0.6100             | 0.3652          | -1.8104        | -1.7739      | -0.9017         | -0.8602       |
| 5.7031        | 0.2696 | 8000  | 3.1672          | -18.0175       | -18.4936         | 0.6100             | 0.4760          | -1.8494        | -1.8018      | -0.9982         | -0.9467       |
| 2.6005        | 0.2831 | 8400  | 3.1475          | -18.1162       | -18.6283         | 0.6100             | 0.5121          | -1.8628        | -1.8116      | -1.0732         | -1.0161       |
| 1.9787        | 0.2966 | 8800  | 3.1226          | -18.3260       | -18.9198         | 0.6100             | 0.5938          | -1.8920        | -1.8326      | -1.1691         | -1.1062       |
| 2.8347        | 0.3101 | 9200  | 3.1156          | -18.4632       | -19.0934         | 0.6100             | 0.6301          | -1.9093        | -1.8463      | -1.2592         | -1.1910       |
| 2.701         | 0.3236 | 9600  | 3.1022          | -18.5083       | -19.1346         | 0.6100             | 0.6264          | -1.9135        | -1.8508      | -1.2785         | -1.2073       |
| 3.772         | 0.3371 | 10000 | 3.0772          | -18.5843       | -19.2491         | 0.6100             | 0.6649          | -1.9249        | -1.8584      | -1.3345         | -1.2587       |
| 2.7414        | 0.3505 | 10400 | 3.0551          | -18.8305       | -19.5946         | 0.6100             | 0.7641          | -1.9595        | -1.8830      | -1.3824         | -1.3004       |
| 2.0287        | 0.3640 | 10800 | 3.0534          | -18.9934       | -19.7985         | 0.6200             | 0.8051          | -1.9798        | -1.8993      | -1.4355         | -1.3467       |
| 1.0473        | 0.3775 | 11200 | 3.0528          | -19.1581       | -19.9858         | 0.6100             | 0.8277          | -1.9986        | -1.9158      | -1.5109         | -1.4173       |
| 2.8106        | 0.3910 | 11600 | 3.0436          | -19.1763       | -19.9989         | 0.6100             | 0.8226          | -1.9999        | -1.9176      | -1.5138         | -1.4206       |
| 3.0344        | 0.4045 | 12000 | 3.0333          | -19.2526       | -20.1079         | 0.6100             | 0.8553          | -2.0108        | -1.9253      | -1.5628         | -1.4657       |
| 2.1886        | 0.4179 | 12400 | 3.0187          | -19.4500       | -20.3818         | 0.6300             | 0.9318          | -2.0382        | -1.9450      | -1.6246         | -1.5217       |
| 4.1181        | 0.4314 | 12800 | 3.0086          | -19.6204       | -20.6104         | 0.6300             | 0.9900          | -2.0610        | -1.9620      | -1.6886         | -1.5818       |
| 1.6647        | 0.4449 | 13200 | 3.0126          | -19.7773       | -20.7949         | 0.6300             | 1.0176          | -2.0795        | -1.9777      | -1.7307         | -1.6181       |
| 4.8533        | 0.4584 | 13600 | 3.0012          | -19.9001       | -20.9633         | 0.6300             | 1.0632          | -2.0963        | -1.9900      | -1.7437         | -1.6288       |
| 2.9945        | 0.4719 | 14000 | 3.0071          | -19.9831       | -21.0361         | 0.6300             | 1.0529          | -2.1036        | -1.9983      | -1.7839         | -1.6667       |
| 2.9377        | 0.4854 | 14400 | 2.9946          | -20.1165       | -21.2172         | 0.6400             | 1.1007          | -2.1217        | -2.0117      | -1.8386         | -1.7178       |
| 2.7856        | 0.4988 | 14800 | 2.9908          | -20.2830       | -21.4151         | 0.6300             | 1.1322          | -2.1415        | -2.0283      | -1.8720         | -1.7468       |
| 4.9446        | 0.5123 | 15200 | 2.9905          | -20.4144       | -21.5669         | 0.6300             | 1.1525          | -2.1567        | -2.0414      | -1.9057         | -1.7760       |
| 3.2834        | 0.5258 | 15600 | 2.9858          | -20.4428       | -21.5993         | 0.6300             | 1.1565          | -2.1599        | -2.0443      | -1.8928         | -1.7633       |
| 1.8705        | 0.5393 | 16000 | 2.9888          | -20.5922       | -21.7774         | 0.6300             | 1.1853          | -2.1777        | -2.0592      | -1.9340         | -1.8009       |
| 4.0587        | 0.5528 | 16400 | 2.9925          | -20.8812       | -22.1359         | 0.6300             | 1.2547          | -2.2136        | -2.0881      | -2.0019         | -1.8627       |
| 3.0706        | 0.5662 | 16800 | 2.9946          | -21.1005       | -22.4176         | 0.6300             | 1.3171          | -2.2418        | -2.1101      | -2.0533         | -1.9104       |
| 3.152         | 0.5797 | 17200 | 2.9916          | -21.2937       | -22.6723         | 0.6200             | 1.3786          | -2.2672        | -2.1294      | -2.1094         | -1.9627       |
| 1.8856        | 0.5932 | 17600 | 2.9847          | -21.2727       | -22.6463         | 0.6200             | 1.3736          | -2.2646        | -2.1273      | -2.1108         | -1.9637       |
| 1.1291        | 0.6067 | 18000 | 2.9981          | -21.5313       | -22.9507         | 0.6200             | 1.4194          | -2.2951        | -2.1531      | -2.1736         | -2.0212       |
| 2.9894        | 0.6202 | 18400 | 3.0033          | -21.6191       | -23.0276         | 0.6200             | 1.4085          | -2.3028        | -2.1619      | -2.2089         | -2.0543       |
| 3.497         | 0.6337 | 18800 | 3.0252          | -21.8198       | -23.2426         | 0.6200             | 1.4228          | -2.3243        | -2.1820      | -2.2285         | -2.0714       |
| 3.18          | 0.6471 | 19200 | 3.0307          | -21.8887       | -23.3005         | 0.6200             | 1.4117          | -2.3300        | -2.1889      | -2.2462         | -2.0862       |
| 1.9522        | 0.6606 | 19600 | 3.0391          | -21.9179       | -23.3214         | 0.6300             | 1.4035          | -2.3321        | -2.1918      | -2.2476         | -2.0875       |
| 2.4878        | 0.6741 | 20000 | 3.0431          | -22.1021       | -23.5543         | 0.6300             | 1.4522          | -2.3554        | -2.2102      | -2.2969         | -2.1333       |
| 2.3506        | 0.6876 | 20400 | 3.0453          | -22.2379       | -23.7220         | 0.6300             | 1.4841          | -2.3722        | -2.2238      | -2.3258         | -2.1603       |
| 3.9719        | 0.7011 | 20800 | 3.0591          | -22.2718       | -23.7317         | 0.6300             | 1.4599          | -2.3732        | -2.2272      | -2.3263         | -2.1600       |
| 1.4942        | 0.7146 | 21200 | 3.0574          | -22.3226       | -23.8044         | 0.6300             | 1.4819          | -2.3804        | -2.2323      | -2.3352         | -2.1680       |
| 0.8797        | 0.7280 | 21600 | 3.0616          | -22.3419       | -23.8235         | 0.6300             | 1.4816          | -2.3823        | -2.2342      | -2.3394         | -2.1721       |
| 2.8176        | 0.7415 | 22000 | 3.0751          | -22.4788       | -23.9643         | 0.6300             | 1.4855          | -2.3964        | -2.2479      | -2.3767         | -2.2073       |
| 3.3744        | 0.7550 | 22400 | 3.0775          | -22.6028       | -24.1137         | 0.6300             | 1.5109          | -2.4114        | -2.2603      | -2.4146         | -2.2423       |
| 1.9708        | 0.7685 | 22800 | 3.0768          | -22.6249       | -24.1479         | 0.6300             | 1.5231          | -2.4148        | -2.2625      | -2.4216         | -2.2482       |
| 2.1589        | 0.7820 | 23200 | 3.0697          | -22.6570       | -24.1936         | 0.6300             | 1.5367          | -2.4194        | -2.2657      | -2.4323         | -2.2591       |
| 3.0872        | 0.7954 | 23600 | 3.0813          | -22.7174       | -24.2489         | 0.6300             | 1.5315          | -2.4249        | -2.2717      | -2.4430         | -2.2683       |
| 3.9705        | 0.8089 | 24000 | 3.0806          | -22.7644       | -24.3076         | 0.6300             | 1.5432          | -2.4308        | -2.2764      | -2.4598         | -2.2840       |
| 3.5691        | 0.8224 | 24400 | 3.0807          | -22.7627       | -24.2931         | 0.6300             | 1.5304          | -2.4293        | -2.2763      | -2.4621         | -2.2857       |
| 1.4467        | 0.8359 | 24800 | 3.0854          | -22.8132       | -24.3525         | 0.6300             | 1.5393          | -2.4353        | -2.2813      | -2.4742         | -2.2963       |
| 2.7241        | 0.8494 | 25200 | 3.0862          | -22.8300       | -24.3745         | 0.6300             | 1.5445          | -2.4375        | -2.2830      | -2.4770         | -2.2988       |
| 2.7441        | 0.8629 | 25600 | 3.0866          | -22.8450       | -24.3876         | 0.6300             | 1.5427          | -2.4388        | -2.2845      | -2.4823         | -2.3048       |
| 1.4801        | 0.8763 | 26000 | 3.0839          | -22.8522       | -24.4010         | 0.6300             | 1.5488          | -2.4401        | -2.2852      | -2.4827         | -2.3057       |
| 2.5965        | 0.8898 | 26400 | 3.0841          | -22.8629       | -24.4169         | 0.6300             | 1.5540          | -2.4417        | -2.2863      | -2.4877         | -2.3095       |
| 3.6415        | 0.9033 | 26800 | 3.0893          | -22.8830       | -24.4340         | 0.6300             | 1.5510          | -2.4434        | -2.2883      | -2.4894         | -2.3114       |
| 2.0584        | 0.9168 | 27200 | 3.0894          | -22.8879       | -24.4268         | 0.6300             | 1.5389          | -2.4427        | -2.2888      | -2.4917         | -2.3134       |
| 2.5068        | 0.9303 | 27600 | 3.0896          | -22.8936       | -24.4408         | 0.6300             | 1.5472          | -2.4441        | -2.2894      | -2.4922         | -2.3134       |
| 0.677         | 0.9437 | 28000 | 3.0835          | -22.8876       | -24.4472         | 0.6300             | 1.5596          | -2.4447        | -2.2888      | -2.4919         | -2.3134       |
| 2.5931        | 0.9572 | 28400 | 3.0875          | -22.8938       | -24.4419         | 0.6300             | 1.5481          | -2.4442        | -2.2894      | -2.4907         | -2.3117       |
| 4.4413        | 0.9707 | 28800 | 3.0893          | -22.8952       | -24.4383         | 0.6300             | 1.5431          | -2.4438        | -2.2895      | -2.4914         | -2.3131       |
| 2.7584        | 0.9842 | 29200 | 3.0874          | -22.8946       | -24.4410         | 0.6300             | 1.5464          | -2.4441        | -2.2895      | -2.4894         | -2.3112       |
| 4.4406        | 0.9977 | 29600 | 3.0877          | -22.8949       | -24.4444         | 0.6300             | 1.5495          | -2.4444        | -2.2895      | -2.4913         | -2.3131       |


### Framework versions

- Transformers 4.45.1
- Pytorch 2.2.2
- Datasets 3.0.1
- Tokenizers 0.20.0