jkazdan commited on
Commit
bfb3681
1 Parent(s): 9af1f45

End of training

Browse files
README.md CHANGED
@@ -17,8 +17,8 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 1.0964
21
- - Num Input Tokens Seen: 22023438
22
 
23
  ## Model description
24
 
@@ -53,84 +53,85 @@ The following hyperparameters were used during training:
53
  | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
54
  |:-------------:|:------:|:----:|:---------------:|:-----------------:|
55
  | No log | 0 | 0 | 1.3956 | 0 |
56
- | 1.6146 | 0.0127 | 5 | 1.3802 | 282512 |
57
- | 1.4749 | 0.0254 | 10 | 1.2939 | 567704 |
58
- | 1.5121 | 0.0381 | 15 | 1.2170 | 843528 |
59
- | 1.2138 | 0.0508 | 20 | 1.1667 | 1122736 |
60
- | 1.1539 | 0.0635 | 25 | 1.1524 | 1406360 |
61
- | 1.1923 | 0.0761 | 30 | 1.1370 | 1683184 |
62
- | 1.0579 | 0.0888 | 35 | 1.1517 | 1964208 |
63
- | 0.8405 | 0.1015 | 40 | 1.1715 | 2249920 |
64
- | 0.9761 | 0.1142 | 45 | 1.1893 | 2531816 |
65
- | 0.898 | 0.1269 | 50 | 1.1797 | 2811232 |
66
- | 0.783 | 0.1396 | 55 | 1.1955 | 3090480 |
67
- | 0.7672 | 0.1523 | 60 | 1.1683 | 3370200 |
68
- | 0.6475 | 0.1650 | 65 | 1.2039 | 3652376 |
69
- | 0.713 | 0.1777 | 70 | 1.1769 | 3934704 |
70
- | 0.5421 | 0.1904 | 75 | 1.1872 | 4212344 |
71
- | 0.7223 | 0.2030 | 80 | 1.1722 | 4490368 |
72
- | 0.4936 | 0.2157 | 85 | 1.1761 | 4772208 |
73
- | 0.5502 | 0.2284 | 90 | 1.1787 | 5053400 |
74
- | 0.5659 | 0.2411 | 95 | 1.1684 | 5332936 |
75
- | 0.536 | 0.2538 | 100 | 1.1723 | 5611592 |
76
- | 0.6211 | 0.2665 | 105 | 1.1651 | 5887688 |
77
- | 0.5204 | 0.2792 | 110 | 1.1623 | 6166520 |
78
- | 0.6105 | 0.2919 | 115 | 1.1631 | 6447464 |
79
- | 0.5034 | 0.3046 | 120 | 1.1558 | 6727120 |
80
- | 0.5015 | 0.3173 | 125 | 1.1583 | 7006104 |
81
- | 0.5196 | 0.3299 | 130 | 1.1507 | 7280368 |
82
- | 0.5377 | 0.3426 | 135 | 1.1537 | 7557408 |
83
- | 0.388 | 0.3553 | 140 | 1.1494 | 7838784 |
84
- | 0.3997 | 0.3680 | 145 | 1.1504 | 8113120 |
85
- | 0.3845 | 0.3807 | 150 | 1.1473 | 8392392 |
86
- | 0.4682 | 0.3934 | 155 | 1.1420 | 8672776 |
87
- | 0.465 | 0.4061 | 160 | 1.1441 | 8951464 |
88
- | 0.3882 | 0.4188 | 165 | 1.1363 | 9236640 |
89
- | 0.3904 | 0.4315 | 170 | 1.1391 | 9516040 |
90
- | 0.3947 | 0.4442 | 175 | 1.1356 | 9793640 |
91
- | 0.3724 | 0.4569 | 180 | 1.1318 | 10070408 |
92
- | 0.3678 | 0.4695 | 185 | 1.1360 | 10354864 |
93
- | 0.3128 | 0.4822 | 190 | 1.1310 | 10631192 |
94
- | 0.4527 | 0.4949 | 195 | 1.1307 | 10916872 |
95
- | 0.4388 | 0.5076 | 200 | 1.1290 | 11191016 |
96
- | 0.4438 | 0.5203 | 205 | 1.1260 | 11469360 |
97
- | 0.4665 | 0.5330 | 210 | 1.1267 | 11751200 |
98
- | 0.4236 | 0.5457 | 215 | 1.1230 | 12024568 |
99
- | 0.3693 | 0.5584 | 220 | 1.1271 | 12302736 |
100
- | 0.4446 | 0.5711 | 225 | 1.1196 | 12587232 |
101
- | 0.3487 | 0.5838 | 230 | 1.1218 | 12869584 |
102
- | 0.4632 | 0.5964 | 235 | 1.1202 | 13148056 |
103
- | 0.375 | 0.6091 | 240 | 1.1188 | 13418432 |
104
- | 0.4151 | 0.6218 | 245 | 1.1226 | 13694672 |
105
- | 0.392 | 0.6345 | 250 | 1.1168 | 13969648 |
106
- | 0.369 | 0.6472 | 255 | 1.1204 | 14250976 |
107
- | 0.3746 | 0.6599 | 260 | 1.1148 | 14534608 |
108
- | 0.362 | 0.6726 | 265 | 1.1152 | 14809712 |
109
- | 0.3711 | 0.6853 | 270 | 1.1140 | 15095104 |
110
- | 0.4216 | 0.6980 | 275 | 1.1119 | 15372264 |
111
- | 0.3574 | 0.7107 | 280 | 1.1111 | 15645280 |
112
- | 0.3658 | 0.7234 | 285 | 1.1116 | 15924544 |
113
- | 0.3106 | 0.7360 | 290 | 1.1076 | 16206560 |
114
- | 0.4078 | 0.7487 | 295 | 1.1071 | 16486336 |
115
- | 0.3886 | 0.7614 | 300 | 1.1070 | 16764376 |
116
- | 0.3268 | 0.7741 | 305 | 1.1083 | 17044776 |
117
- | 0.4635 | 0.7868 | 310 | 1.1075 | 17325440 |
118
- | 0.3901 | 0.7995 | 315 | 1.1067 | 17607072 |
119
- | 0.4522 | 0.8122 | 320 | 1.1053 | 17886392 |
120
- | 0.4702 | 0.8249 | 325 | 1.1071 | 18165392 |
121
- | 0.4022 | 0.8376 | 330 | 1.1046 | 18439712 |
122
- | 0.3274 | 0.8503 | 335 | 1.1051 | 18716912 |
123
- | 0.3524 | 0.8629 | 340 | 1.1030 | 18998184 |
124
- | 0.3047 | 0.8756 | 345 | 1.1046 | 19277760 |
125
- | 0.2845 | 0.8883 | 350 | 1.1045 | 19557592 |
126
- | 0.42 | 0.9010 | 355 | 1.1012 | 19830712 |
127
- | 0.3763 | 0.9137 | 360 | 1.1019 | 20110328 |
128
- | 0.3451 | 0.9264 | 365 | 1.1003 | 20393248 |
129
- | 0.4428 | 0.9391 | 370 | 1.1006 | 20677544 |
130
- | 0.4713 | 0.9518 | 375 | 1.0985 | 20957688 |
131
- | 0.4586 | 0.9645 | 380 | 1.0999 | 21236792 |
132
- | 0.4426 | 0.9772 | 385 | 1.0989 | 21519496 |
133
- | 0.3644 | 0.9898 | 390 | 1.0992 | 21799824 |
 
134
 
135
 
136
  ### Framework versions
 
17
 
18
  This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 1.1082
21
+ - Num Input Tokens Seen: 22210232
22
 
23
  ## Model description
24
 
 
53
  | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
54
  |:-------------:|:------:|:----:|:---------------:|:-----------------:|
55
  | No log | 0 | 0 | 1.3956 | 0 |
56
+ | 1.503 | 0.0126 | 5 | 1.3800 | 282920 |
57
+ | 1.455 | 0.0252 | 10 | 1.2923 | 559184 |
58
+ | 1.3333 | 0.0378 | 15 | 1.2129 | 846856 |
59
+ | 1.2486 | 0.0504 | 20 | 1.1638 | 1126408 |
60
+ | 1.1551 | 0.0631 | 25 | 1.1452 | 1406528 |
61
+ | 1.1054 | 0.0757 | 30 | 1.1245 | 1688296 |
62
+ | 1.0965 | 0.0883 | 35 | 1.1353 | 1968880 |
63
+ | 1.0551 | 0.1009 | 40 | 1.1321 | 2253600 |
64
+ | 1.0597 | 0.1135 | 45 | 1.1559 | 2533712 |
65
+ | 0.9056 | 0.1261 | 50 | 1.1557 | 2816168 |
66
+ | 0.8464 | 0.1387 | 55 | 1.1733 | 3098832 |
67
+ | 0.9006 | 0.1513 | 60 | 1.1706 | 3382160 |
68
+ | 0.9186 | 0.1640 | 65 | 1.1701 | 3666944 |
69
+ | 0.8413 | 0.1766 | 70 | 1.1751 | 3944648 |
70
+ | 0.7113 | 0.1892 | 75 | 1.1802 | 4223664 |
71
+ | 0.7537 | 0.2018 | 80 | 1.1851 | 4508224 |
72
+ | 0.6394 | 0.2144 | 85 | 1.1706 | 4784136 |
73
+ | 0.6311 | 0.2270 | 90 | 1.1754 | 5067048 |
74
+ | 0.6254 | 0.2396 | 95 | 1.1784 | 5349712 |
75
+ | 0.6607 | 0.2522 | 100 | 1.1751 | 5633272 |
76
+ | 0.5837 | 0.2649 | 105 | 1.1756 | 5912768 |
77
+ | 0.6424 | 0.2775 | 110 | 1.1776 | 6191704 |
78
+ | 0.6406 | 0.2901 | 115 | 1.1754 | 6470568 |
79
+ | 0.5878 | 0.3027 | 120 | 1.1710 | 6744504 |
80
+ | 0.5724 | 0.3153 | 125 | 1.1764 | 7024664 |
81
+ | 0.5836 | 0.3279 | 130 | 1.1698 | 7302984 |
82
+ | 0.446 | 0.3405 | 135 | 1.1691 | 7585104 |
83
+ | 0.5857 | 0.3531 | 140 | 1.1700 | 7862824 |
84
+ | 0.5039 | 0.3658 | 145 | 1.1668 | 8148912 |
85
+ | 0.5541 | 0.3784 | 150 | 1.1697 | 8433288 |
86
+ | 0.4768 | 0.3910 | 155 | 1.1661 | 8709864 |
87
+ | 0.5697 | 0.4036 | 160 | 1.1624 | 8988544 |
88
+ | 0.4883 | 0.4162 | 165 | 1.1638 | 9266360 |
89
+ | 0.4343 | 0.4288 | 170 | 1.1564 | 9543464 |
90
+ | 0.4952 | 0.4414 | 175 | 1.1573 | 9819888 |
91
+ | 0.4182 | 0.4540 | 180 | 1.1566 | 10103184 |
92
+ | 0.4055 | 0.4667 | 185 | 1.1518 | 10386496 |
93
+ | 0.4183 | 0.4793 | 190 | 1.1527 | 10666176 |
94
+ | 0.4075 | 0.4919 | 195 | 1.1490 | 10945288 |
95
+ | 0.5048 | 0.5045 | 200 | 1.1506 | 11223232 |
96
+ | 0.4409 | 0.5171 | 205 | 1.1465 | 11500056 |
97
+ | 0.4171 | 0.5297 | 210 | 1.1466 | 11780848 |
98
+ | 0.4131 | 0.5423 | 215 | 1.1399 | 12068144 |
99
+ | 0.4431 | 0.5549 | 220 | 1.1458 | 12350288 |
100
+ | 0.506 | 0.5676 | 225 | 1.1378 | 12628160 |
101
+ | 0.4679 | 0.5802 | 230 | 1.1369 | 12916360 |
102
+ | 0.3934 | 0.5928 | 235 | 1.1356 | 13195560 |
103
+ | 0.399 | 0.6054 | 240 | 1.1323 | 13478840 |
104
+ | 0.3821 | 0.6180 | 245 | 1.1334 | 13758120 |
105
+ | 0.4344 | 0.6306 | 250 | 1.1333 | 14040032 |
106
+ | 0.4234 | 0.6432 | 255 | 1.1304 | 14330400 |
107
+ | 0.3893 | 0.6558 | 260 | 1.1310 | 14609640 |
108
+ | 0.4944 | 0.6685 | 265 | 1.1288 | 14888960 |
109
+ | 0.3908 | 0.6811 | 270 | 1.1267 | 15176120 |
110
+ | 0.4795 | 0.6937 | 275 | 1.1300 | 15451048 |
111
+ | 0.3164 | 0.7063 | 280 | 1.1254 | 15731384 |
112
+ | 0.3661 | 0.7189 | 285 | 1.1277 | 16012616 |
113
+ | 0.4078 | 0.7315 | 290 | 1.1210 | 16294800 |
114
+ | 0.3492 | 0.7441 | 295 | 1.1256 | 16575776 |
115
+ | 0.3645 | 0.7567 | 300 | 1.1228 | 16854944 |
116
+ | 0.3274 | 0.7694 | 305 | 1.1202 | 17128336 |
117
+ | 0.4235 | 0.7820 | 310 | 1.1261 | 17405248 |
118
+ | 0.3793 | 0.7946 | 315 | 1.1186 | 17689720 |
119
+ | 0.3922 | 0.8072 | 320 | 1.1193 | 17960552 |
120
+ | 0.3589 | 0.8198 | 325 | 1.1177 | 18241224 |
121
+ | 0.3804 | 0.8324 | 330 | 1.1196 | 18526704 |
122
+ | 0.4036 | 0.8450 | 335 | 1.1169 | 18799280 |
123
+ | 0.4325 | 0.8576 | 340 | 1.1151 | 19085152 |
124
+ | 0.4554 | 0.8703 | 345 | 1.1187 | 19360616 |
125
+ | 0.4497 | 0.8829 | 350 | 1.1144 | 19636560 |
126
+ | 0.4199 | 0.8955 | 355 | 1.1148 | 19914344 |
127
+ | 0.4325 | 0.9081 | 360 | 1.1146 | 20197568 |
128
+ | 0.4471 | 0.9207 | 365 | 1.1124 | 20475496 |
129
+ | 0.3495 | 0.9333 | 370 | 1.1119 | 20753488 |
130
+ | 0.3166 | 0.9459 | 375 | 1.1116 | 21032504 |
131
+ | 0.4198 | 0.9585 | 380 | 1.1131 | 21311792 |
132
+ | 0.3419 | 0.9711 | 385 | 1.1107 | 21593296 |
133
+ | 0.3901 | 0.9838 | 390 | 1.1103 | 21874144 |
134
+ | 0.4237 | 0.9964 | 395 | 1.1078 | 22154792 |
135
 
136
 
137
  ### Framework versions
model-00001-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:679b30d475f5b80bc56646188055c0eb80062ca05590f53d6db72493c5f4c614
3
  size 4988025760
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f8f87c73d11ef714f1c604266791dd3a3c2a1c9136b269523b1e7912e047bce
3
  size 4988025760
model-00002-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:912e41d3aa100976def9db08d23bccab98e15a60edea443c63cb71d04c2ebbe3
3
  size 240691728
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3af88fa18ca21883af2389a8612356dcf5b7a5bd4deac9e073071adf2290e9a8
3
  size 240691728
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:df80ccbf893b3b86c82efeea8649352f7b6cab6143a3c44539d8379796db7202
3
  size 5560
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85c30adaedcedf381035532e3691d4dcd604aecbba8f39adef4c9f4d2b92a141
3
  size 5560