paligemma_racer / README.md
mateoguaman's picture
End of training
e6ed6fb verified
metadata
library_name: transformers
license: gemma
base_model: google/paligemma-3b-pt-224
tags:
  - generated_from_trainer
model-index:
  - name: paligemma_racer
    results: []

paligemma_racer

This model is a fine-tuned version of google/paligemma-3b-pt-224 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9411

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use adamw_hf with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss
13.5721 0.0209 50 7.2124
5.9601 0.0419 100 5.0726
4.731 0.0628 150 4.4045
4.2565 0.0837 200 4.0800
4.0418 0.1047 250 3.8702
3.866 0.1256 300 3.7370
3.6864 0.1465 350 3.5834
3.649 0.1675 400 3.5023
3.572 0.1884 450 3.4903
3.4765 0.2093 500 3.4307
3.406 0.2302 550 3.3801
3.3997 0.2512 600 3.3027
3.3602 0.2721 650 3.2871
3.2852 0.2930 700 3.2509
3.3183 0.3140 750 3.2354
3.3281 0.3349 800 3.2133
3.2545 0.3558 850 3.2098
3.3173 0.3768 900 3.1909
3.1993 0.3977 950 3.1646
3.1705 0.4186 1000 3.1401
3.1976 0.4396 1050 3.1217
3.1514 0.4605 1100 3.1340
3.1832 0.4814 1150 3.1282
3.1222 0.5024 1200 3.0997
3.1003 0.5233 1250 3.0788
3.0833 0.5442 1300 3.0735
3.099 0.5651 1350 3.0665
3.1295 0.5861 1400 3.0534
3.0962 0.6070 1450 3.0392
3.0589 0.6279 1500 3.0325
3.075 0.6489 1550 3.0311
3.034 0.6698 1600 3.0461
3.0333 0.6907 1650 3.0190
3.0494 0.7117 1700 3.0174
3.071 0.7326 1750 3.0123
3.0147 0.7535 1800 3.0020
3.0114 0.7745 1850 3.0074
3.0635 0.7954 1900 3.0224
2.9939 0.8163 1950 2.9942
3.0373 0.8373 2000 2.9888
2.998 0.8582 2050 2.9905
3.0004 0.8791 2100 2.9883
2.9477 0.9001 2150 2.9887
2.9837 0.9210 2200 2.9830
2.9501 0.9419 2250 2.9788
3.0235 0.9628 2300 2.9877
3.0083 0.9838 2350 2.9723
2.9368 1.0047 2400 2.9775
2.9975 1.0256 2450 2.9712
2.9089 1.0466 2500 2.9616
2.9285 1.0675 2550 2.9669
2.9627 1.0884 2600 2.9668
2.9195 1.1094 2650 2.9683
2.9319 1.1303 2700 2.9607
2.9009 1.1512 2750 2.9592
2.9486 1.1722 2800 2.9525
2.9416 1.1931 2850 2.9532
2.9223 1.2140 2900 2.9547
2.9257 1.2350 2950 2.9520
2.9182 1.2559 3000 2.9516
2.9255 1.2768 3050 2.9502
2.9113 1.2977 3100 2.9579
2.9165 1.3187 3150 2.9584
2.8901 1.3396 3200 2.9528
2.921 1.3605 3250 2.9470
2.9299 1.3815 3300 2.9481
2.9728 1.4024 3350 2.9458
2.919 1.4233 3400 2.9446
2.9132 1.4443 3450 2.9446
2.9178 1.4652 3500 2.9486
2.9293 1.4861 3550 2.9450
2.9514 1.5071 3600 2.9431
2.9099 1.5280 3650 2.9444
2.9292 1.5489 3700 2.9449
2.9336 1.5699 3750 2.9445
2.8772 1.5908 3800 2.9446
2.9389 1.6117 3850 2.9444
2.9618 1.6327 3900 2.9448
2.9721 1.6536 3950 2.9425
2.9052 1.6745 4000 2.9406
2.9245 1.6954 4050 2.9448
2.9196 1.7164 4100 2.9429
2.9622 1.7373 4150 2.9408
2.9199 1.7582 4200 2.9394
2.9114 1.7792 4250 2.9385
2.9548 1.8001 4300 2.9402
2.9263 1.8210 4350 2.9405
2.9079 1.8420 4400 2.9414
2.9144 1.8629 4450 2.9367
2.8985 1.8838 4500 2.9412
2.8942 1.9048 4550 2.9446
2.91 1.9257 4600 2.9424
2.8951 1.9466 4650 2.9414
2.9054 1.9676 4700 2.9411
2.8909 1.9885 4750 2.9411

Framework versions

  • Transformers 4.46.3
  • Pytorch 2.5.1
  • Datasets 3.1.0
  • Tokenizers 0.20.3