File size: 18,336 Bytes
f51b84a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd0
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd0

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.1144
- Num Input Tokens Seen: 63047896

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3956          | 0                 |
| 1.6564        | 0.0043 | 5    | 1.3937          | 275648            |
| 1.625         | 0.0086 | 10   | 1.3764          | 549096            |
| 1.6127        | 0.0128 | 15   | 1.3400          | 816216            |
| 1.4661        | 0.0171 | 20   | 1.2820          | 1085976           |
| 1.4385        | 0.0214 | 25   | 1.2418          | 1354672           |
| 1.4354        | 0.0257 | 30   | 1.2025          | 1622592           |
| 1.3057        | 0.0299 | 35   | 1.1783          | 1881872           |
| 1.2292        | 0.0342 | 40   | 1.1762          | 2153032           |
| 1.1657        | 0.0385 | 45   | 1.1750          | 2421888           |
| 1.0488        | 0.0428 | 50   | 1.1851          | 2689960           |
| 0.9864        | 0.0471 | 55   | 1.1953          | 2949744           |
| 0.8158        | 0.0513 | 60   | 1.2533          | 3220936           |
| 0.688         | 0.0556 | 65   | 1.2690          | 3491824           |
| 0.7325        | 0.0599 | 70   | 1.2536          | 3769288           |
| 0.5427        | 0.0642 | 75   | 1.2794          | 4039720           |
| 0.6183        | 0.0684 | 80   | 1.2494          | 4306072           |
| 0.4428        | 0.0727 | 85   | 1.2401          | 4573368           |
| 0.4545        | 0.0770 | 90   | 1.2470          | 4842648           |
| 0.4023        | 0.0813 | 95   | 1.2484          | 5114144           |
| 0.5039        | 0.0856 | 100  | 1.2291          | 5384112           |
| 0.3227        | 0.0898 | 105  | 1.2389          | 5656336           |
| 0.3485        | 0.0941 | 110  | 1.2460          | 5921680           |
| 0.3794        | 0.0984 | 115  | 1.2286          | 6189560           |
| 0.2328        | 0.1027 | 120  | 1.2411          | 6461112           |
| 0.3787        | 0.1069 | 125  | 1.2257          | 6724808           |
| 0.3868        | 0.1112 | 130  | 1.2166          | 6996328           |
| 0.3563        | 0.1155 | 135  | 1.2265          | 7265576           |
| 0.361         | 0.1198 | 140  | 1.2118          | 7535448           |
| 0.2624        | 0.1241 | 145  | 1.2149          | 7809568           |
| 0.3361        | 0.1283 | 150  | 1.2080          | 8075824           |
| 0.2209        | 0.1326 | 155  | 1.2176          | 8344136           |
| 0.3692        | 0.1369 | 160  | 1.2077          | 8617576           |
| 0.3648        | 0.1412 | 165  | 1.2167          | 8896208           |
| 0.3819        | 0.1454 | 170  | 1.1981          | 9168616           |
| 0.3246        | 0.1497 | 175  | 1.2059          | 9439392           |
| 0.2592        | 0.1540 | 180  | 1.2013          | 9712992           |
| 0.2463        | 0.1583 | 185  | 1.2000          | 9970816           |
| 0.1901        | 0.1625 | 190  | 1.1996          | 10238784          |
| 0.2588        | 0.1668 | 195  | 1.1978          | 10513696          |
| 0.346         | 0.1711 | 200  | 1.1957          | 10788672          |
| 0.1714        | 0.1754 | 205  | 1.1987          | 11064928          |
| 0.2532        | 0.1797 | 210  | 1.2013          | 11327736          |
| 0.2951        | 0.1839 | 215  | 1.1940          | 11593984          |
| 0.224         | 0.1882 | 220  | 1.2007          | 11870624          |
| 0.1832        | 0.1925 | 225  | 1.1991          | 12144200          |
| 0.3316        | 0.1968 | 230  | 1.1969          | 12410456          |
| 0.2406        | 0.2010 | 235  | 1.1887          | 12682736          |
| 0.1945        | 0.2053 | 240  | 1.1951          | 12948936          |
| 0.2001        | 0.2096 | 245  | 1.1937          | 13220632          |
| 0.2604        | 0.2139 | 250  | 1.1890          | 13495880          |
| 0.2195        | 0.2182 | 255  | 1.1908          | 13768416          |
| 0.2426        | 0.2224 | 260  | 1.1886          | 14038912          |
| 0.2231        | 0.2267 | 265  | 1.1897          | 14303120          |
| 0.215         | 0.2310 | 270  | 1.1830          | 14569728          |
| 0.2297        | 0.2353 | 275  | 1.1879          | 14842848          |
| 0.2042        | 0.2395 | 280  | 1.1844          | 15117944          |
| 0.2103        | 0.2438 | 285  | 1.1818          | 15392440          |
| 0.2358        | 0.2481 | 290  | 1.1812          | 15660888          |
| 0.2139        | 0.2524 | 295  | 1.1770          | 15932928          |
| 0.2129        | 0.2567 | 300  | 1.1832          | 16206296          |
| 0.2495        | 0.2609 | 305  | 1.1813          | 16476064          |
| 0.2447        | 0.2652 | 310  | 1.1746          | 16744344          |
| 0.2493        | 0.2695 | 315  | 1.1787          | 17017328          |
| 0.1736        | 0.2738 | 320  | 1.1757          | 17293648          |
| 0.2021        | 0.2780 | 325  | 1.1751          | 17564352          |
| 0.1906        | 0.2823 | 330  | 1.1791          | 17832488          |
| 0.1566        | 0.2866 | 335  | 1.1729          | 18101936          |
| 0.2381        | 0.2909 | 340  | 1.1767          | 18366272          |
| 0.1651        | 0.2952 | 345  | 1.1728          | 18638096          |
| 0.2087        | 0.2994 | 350  | 1.1715          | 18902976          |
| 0.1556        | 0.3037 | 355  | 1.1758          | 19179072          |
| 0.1836        | 0.3080 | 360  | 1.1743          | 19451392          |
| 0.206         | 0.3123 | 365  | 1.1675          | 19719608          |
| 0.1513        | 0.3165 | 370  | 1.1694          | 19993216          |
| 0.1117        | 0.3208 | 375  | 1.1653          | 20262080          |
| 0.1809        | 0.3251 | 380  | 1.1670          | 20529968          |
| 0.1587        | 0.3294 | 385  | 1.1727          | 20797888          |
| 0.2179        | 0.3337 | 390  | 1.1644          | 21063696          |
| 0.1565        | 0.3379 | 395  | 1.1639          | 21340488          |
| 0.1914        | 0.3422 | 400  | 1.1622          | 21610344          |
| 0.189         | 0.3465 | 405  | 1.1608          | 21888272          |
| 0.2155        | 0.3508 | 410  | 1.1624          | 22157912          |
| 0.1637        | 0.3550 | 415  | 1.1615          | 22428144          |
| 0.1893        | 0.3593 | 420  | 1.1611          | 22697424          |
| 0.1579        | 0.3636 | 425  | 1.1582          | 22970232          |
| 0.1733        | 0.3679 | 430  | 1.1619          | 23236448          |
| 0.2003        | 0.3722 | 435  | 1.1568          | 23509592          |
| 0.203         | 0.3764 | 440  | 1.1562          | 23781360          |
| 0.2085        | 0.3807 | 445  | 1.1581          | 24053160          |
| 0.2108        | 0.3850 | 450  | 1.1530          | 24327256          |
| 0.1651        | 0.3893 | 455  | 1.1540          | 24591984          |
| 0.1421        | 0.3935 | 460  | 1.1583          | 24864504          |
| 0.1734        | 0.3978 | 465  | 1.1491          | 25138208          |
| 0.247         | 0.4021 | 470  | 1.1512          | 25406984          |
| 0.214         | 0.4064 | 475  | 1.1536          | 25672240          |
| 0.2141        | 0.4107 | 480  | 1.1522          | 25938408          |
| 0.1223        | 0.4149 | 485  | 1.1535          | 26207792          |
| 0.1772        | 0.4192 | 490  | 1.1535          | 26472776          |
| 0.2028        | 0.4235 | 495  | 1.1473          | 26747664          |
| 0.1715        | 0.4278 | 500  | 1.1493          | 27015688          |
| 0.2138        | 0.4320 | 505  | 1.1453          | 27278504          |
| 0.1572        | 0.4363 | 510  | 1.1478          | 27547848          |
| 0.1712        | 0.4406 | 515  | 1.1450          | 27809848          |
| 0.213         | 0.4449 | 520  | 1.1468          | 28083624          |
| 0.2085        | 0.4491 | 525  | 1.1469          | 28357112          |
| 0.1312        | 0.4534 | 530  | 1.1428          | 28624624          |
| 0.1982        | 0.4577 | 535  | 1.1426          | 28895280          |
| 0.1566        | 0.4620 | 540  | 1.1468          | 29159584          |
| 0.1547        | 0.4663 | 545  | 1.1453          | 29429200          |
| 0.2244        | 0.4705 | 550  | 1.1428          | 29697536          |
| 0.1952        | 0.4748 | 555  | 1.1441          | 29966616          |
| 0.1646        | 0.4791 | 560  | 1.1420          | 30234376          |
| 0.1243        | 0.4834 | 565  | 1.1418          | 30509392          |
| 0.1995        | 0.4876 | 570  | 1.1419          | 30785368          |
| 0.1989        | 0.4919 | 575  | 1.1398          | 31060456          |
| 0.2007        | 0.4962 | 580  | 1.1386          | 31326208          |
| 0.1472        | 0.5005 | 585  | 1.1393          | 31594472          |
| 0.1106        | 0.5048 | 590  | 1.1399          | 31860304          |
| 0.2542        | 0.5090 | 595  | 1.1378          | 32132960          |
| 0.2023        | 0.5133 | 600  | 1.1358          | 32408064          |
| 0.1613        | 0.5176 | 605  | 1.1389          | 32680560          |
| 0.1493        | 0.5219 | 610  | 1.1369          | 32954248          |
| 0.1255        | 0.5261 | 615  | 1.1378          | 33215640          |
| 0.0936        | 0.5304 | 620  | 1.1401          | 33485632          |
| 0.1824        | 0.5347 | 625  | 1.1382          | 33756656          |
| 0.2243        | 0.5390 | 630  | 1.1390          | 34026464          |
| 0.1573        | 0.5433 | 635  | 1.1361          | 34299816          |
| 0.1638        | 0.5475 | 640  | 1.1352          | 34570872          |
| 0.1157        | 0.5518 | 645  | 1.1360          | 34838312          |
| 0.1701        | 0.5561 | 650  | 1.1342          | 35106056          |
| 0.2314        | 0.5604 | 655  | 1.1337          | 35374072          |
| 0.1754        | 0.5646 | 660  | 1.1351          | 35634464          |
| 0.1703        | 0.5689 | 665  | 1.1320          | 35907424          |
| 0.2359        | 0.5732 | 670  | 1.1314          | 36170096          |
| 0.2349        | 0.5775 | 675  | 1.1329          | 36442024          |
| 0.1305        | 0.5818 | 680  | 1.1308          | 36706288          |
| 0.1876        | 0.5860 | 685  | 1.1312          | 36973688          |
| 0.1347        | 0.5903 | 690  | 1.1320          | 37241296          |
| 0.2262        | 0.5946 | 695  | 1.1314          | 37512872          |
| 0.1998        | 0.5989 | 700  | 1.1326          | 37782680          |
| 0.1055        | 0.6031 | 705  | 1.1304          | 38053608          |
| 0.2393        | 0.6074 | 710  | 1.1302          | 38325008          |
| 0.1775        | 0.6117 | 715  | 1.1307          | 38589416          |
| 0.2197        | 0.6160 | 720  | 1.1277          | 38853576          |
| 0.166         | 0.6203 | 725  | 1.1256          | 39122008          |
| 0.1593        | 0.6245 | 730  | 1.1300          | 39396560          |
| 0.1923        | 0.6288 | 735  | 1.1328          | 39666480          |
| 0.1976        | 0.6331 | 740  | 1.1306          | 39934776          |
| 0.1625        | 0.6374 | 745  | 1.1272          | 40198928          |
| 0.1268        | 0.6416 | 750  | 1.1290          | 40474816          |
| 0.219         | 0.6459 | 755  | 1.1289          | 40738928          |
| 0.2275        | 0.6502 | 760  | 1.1235          | 41014112          |
| 0.0704        | 0.6545 | 765  | 1.1265          | 41291400          |
| 0.1353        | 0.6588 | 770  | 1.1284          | 41567064          |
| 0.1344        | 0.6630 | 775  | 1.1257          | 41835856          |
| 0.1868        | 0.6673 | 780  | 1.1241          | 42108416          |
| 0.2027        | 0.6716 | 785  | 1.1269          | 42376552          |
| 0.1119        | 0.6759 | 790  | 1.1281          | 42639272          |
| 0.1379        | 0.6801 | 795  | 1.1261          | 42911096          |
| 0.2652        | 0.6844 | 800  | 1.1265          | 43184912          |
| 0.1232        | 0.6887 | 805  | 1.1253          | 43452840          |
| 0.1459        | 0.6930 | 810  | 1.1239          | 43719024          |
| 0.1376        | 0.6973 | 815  | 1.1257          | 43982968          |
| 0.1484        | 0.7015 | 820  | 1.1273          | 44251808          |
| 0.1617        | 0.7058 | 825  | 1.1248          | 44520088          |
| 0.1703        | 0.7101 | 830  | 1.1240          | 44782312          |
| 0.2121        | 0.7144 | 835  | 1.1246          | 45055208          |
| 0.1987        | 0.7186 | 840  | 1.1221          | 45329256          |
| 0.1687        | 0.7229 | 845  | 1.1218          | 45600800          |
| 0.1417        | 0.7272 | 850  | 1.1245          | 45871688          |
| 0.2093        | 0.7315 | 855  | 1.1243          | 46145112          |
| 0.1644        | 0.7358 | 860  | 1.1260          | 46416248          |
| 0.17          | 0.7400 | 865  | 1.1265          | 46685400          |
| 0.197         | 0.7443 | 870  | 1.1215          | 46949488          |
| 0.2171        | 0.7486 | 875  | 1.1240          | 47221208          |
| 0.148         | 0.7529 | 880  | 1.1252          | 47503016          |
| 0.1472        | 0.7571 | 885  | 1.1223          | 47771504          |
| 0.0773        | 0.7614 | 890  | 1.1200          | 48043096          |
| 0.1024        | 0.7657 | 895  | 1.1236          | 48310640          |
| 0.0715        | 0.7700 | 900  | 1.1226          | 48579272          |
| 0.161         | 0.7742 | 905  | 1.1208          | 48845664          |
| 0.2209        | 0.7785 | 910  | 1.1225          | 49116328          |
| 0.2193        | 0.7828 | 915  | 1.1227          | 49384192          |
| 0.1065        | 0.7871 | 920  | 1.1213          | 49653128          |
| 0.1488        | 0.7914 | 925  | 1.1221          | 49933168          |
| 0.2447        | 0.7956 | 930  | 1.1200          | 50208440          |
| 0.1157        | 0.7999 | 935  | 1.1216          | 50474600          |
| 0.1756        | 0.8042 | 940  | 1.1227          | 50741896          |
| 0.1873        | 0.8085 | 945  | 1.1186          | 51008128          |
| 0.1736        | 0.8127 | 950  | 1.1199          | 51282936          |
| 0.1495        | 0.8170 | 955  | 1.1226          | 51545616          |
| 0.1663        | 0.8213 | 960  | 1.1194          | 51809832          |
| 0.1343        | 0.8256 | 965  | 1.1184          | 52083672          |
| 0.1252        | 0.8299 | 970  | 1.1195          | 52355144          |
| 0.111         | 0.8341 | 975  | 1.1202          | 52630616          |
| 0.1025        | 0.8384 | 980  | 1.1203          | 52908440          |
| 0.1644        | 0.8427 | 985  | 1.1195          | 53182968          |
| 0.1614        | 0.8470 | 990  | 1.1192          | 53448960          |
| 0.1156        | 0.8512 | 995  | 1.1206          | 53722632          |
| 0.1378        | 0.8555 | 1000 | 1.1192          | 53998512          |
| 0.1776        | 0.8598 | 1005 | 1.1169          | 54263744          |
| 0.2257        | 0.8641 | 1010 | 1.1174          | 54526592          |
| 0.1631        | 0.8684 | 1015 | 1.1210          | 54792792          |
| 0.1759        | 0.8726 | 1020 | 1.1169          | 55069680          |
| 0.1197        | 0.8769 | 1025 | 1.1142          | 55350464          |
| 0.1768        | 0.8812 | 1030 | 1.1170          | 55621960          |
| 0.2284        | 0.8855 | 1035 | 1.1190          | 55896744          |
| 0.1251        | 0.8897 | 1040 | 1.1156          | 56164720          |
| 0.1812        | 0.8940 | 1045 | 1.1176          | 56434136          |
| 0.234         | 0.8983 | 1050 | 1.1171          | 56709136          |
| 0.1637        | 0.9026 | 1055 | 1.1145          | 56974616          |
| 0.1279        | 0.9069 | 1060 | 1.1162          | 57242824          |
| 0.1495        | 0.9111 | 1065 | 1.1177          | 57511368          |
| 0.155         | 0.9154 | 1070 | 1.1181          | 57774344          |
| 0.2235        | 0.9197 | 1075 | 1.1162          | 58043560          |
| 0.126         | 0.9240 | 1080 | 1.1158          | 58312920          |
| 0.1786        | 0.9282 | 1085 | 1.1173          | 58587160          |
| 0.1193        | 0.9325 | 1090 | 1.1163          | 58858704          |
| 0.1405        | 0.9368 | 1095 | 1.1142          | 59120792          |
| 0.2019        | 0.9411 | 1100 | 1.1165          | 59388184          |
| 0.2109        | 0.9454 | 1105 | 1.1159          | 59648456          |
| 0.1786        | 0.9496 | 1110 | 1.1163          | 59925824          |
| 0.1741        | 0.9539 | 1115 | 1.1162          | 60199640          |
| 0.1791        | 0.9582 | 1120 | 1.1137          | 60469672          |
| 0.1162        | 0.9625 | 1125 | 1.1154          | 60742672          |
| 0.1385        | 0.9667 | 1130 | 1.1159          | 61012624          |
| 0.1489        | 0.9710 | 1135 | 1.1142          | 61279728          |
| 0.1068        | 0.9753 | 1140 | 1.1141          | 61546392          |
| 0.1712        | 0.9796 | 1145 | 1.1140          | 61811624          |
| 0.1502        | 0.9839 | 1150 | 1.1128          | 62076504          |
| 0.1743        | 0.9881 | 1155 | 1.1140          | 62348416          |
| 0.1894        | 0.9924 | 1160 | 1.1132          | 62611880          |
| 0.1271        | 0.9967 | 1165 | 1.1129          | 62884000          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1