collapse_gemma-2-27b_hs2_accumulate_iter2_sftsd0
This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9209
- Num Input Tokens Seen: 9209636
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.1282 | 0 |
1.7181 | 0.0278 | 5 | 1.0178 | 258940 |
1.5727 | 0.0556 | 10 | 0.9732 | 523840 |
1.4846 | 0.0834 | 15 | 0.9611 | 775600 |
1.5742 | 0.1113 | 20 | 0.9573 | 1031780 |
1.5061 | 0.1391 | 25 | 0.9571 | 1291404 |
1.2746 | 0.1669 | 30 | 0.9544 | 1551680 |
1.2702 | 0.1947 | 35 | 0.9557 | 1808156 |
1.329 | 0.2225 | 40 | 0.9525 | 2060568 |
1.1092 | 0.2503 | 45 | 0.9495 | 2319496 |
0.9658 | 0.2782 | 50 | 0.9482 | 2567632 |
1.0994 | 0.3060 | 55 | 0.9444 | 2831744 |
1.0686 | 0.3338 | 60 | 0.9435 | 3087788 |
1.115 | 0.3616 | 65 | 0.9405 | 3340312 |
1.0044 | 0.3894 | 70 | 0.9375 | 3602000 |
1.1384 | 0.4172 | 75 | 0.9357 | 3868648 |
1.0943 | 0.4451 | 80 | 0.9361 | 4121888 |
1.0129 | 0.4729 | 85 | 0.9323 | 4375104 |
0.9281 | 0.5007 | 90 | 0.9314 | 4629144 |
0.9001 | 0.5285 | 95 | 0.9316 | 4881800 |
1.0471 | 0.5563 | 100 | 0.9303 | 5142288 |
1.0141 | 0.5841 | 105 | 0.9302 | 5398480 |
1.0427 | 0.6120 | 110 | 0.9280 | 5651544 |
0.9628 | 0.6398 | 115 | 0.9274 | 5904284 |
0.8986 | 0.6676 | 120 | 0.9257 | 6160992 |
0.9081 | 0.6954 | 125 | 0.9279 | 6427076 |
0.957 | 0.7232 | 130 | 0.9241 | 6686176 |
0.9556 | 0.7510 | 135 | 0.9246 | 6942364 |
0.9609 | 0.7789 | 140 | 0.9244 | 7193836 |
0.9889 | 0.8067 | 145 | 0.9228 | 7452352 |
0.9009 | 0.8345 | 150 | 0.9231 | 7708728 |
0.8942 | 0.8623 | 155 | 0.9217 | 7969644 |
0.9304 | 0.8901 | 160 | 0.9216 | 8223032 |
0.9462 | 0.9179 | 165 | 0.9212 | 8481188 |
0.9904 | 0.9458 | 170 | 0.9204 | 8743924 |
0.9147 | 0.9736 | 175 | 0.9204 | 8999112 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 14
Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter2_sftsd0
Base model
google/gemma-2-27b