paligemma_racer_longer_wu_larger_bs

This model is a fine-tuned version of google/paligemma-3b-pt-224 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use adamw_hf with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 2

Training Loss	Epoch	Step	Validation Loss
16.4798	0.0419	50	16.1631
14.3451	0.0837	100	11.3560
7.8988	0.1256	150	6.2607
5.8302	0.1674	200	5.3570
5.1658	0.2093	250	4.8183
4.7276	0.2512	300	4.4561
4.4283	0.2930	350	4.2387
4.2557	0.3349	400	4.0384
4.0452	0.3767	450	3.8548
3.8255	0.4186	500	3.7227
3.7081	0.4604	550	3.5824
3.5892	0.5023	600	3.4951
3.4664	0.5442	650	3.3667
3.4128	0.5860	700	3.2978
3.3171	0.6279	750	3.2273
3.253	0.6697	800	3.2083
3.1882	0.7116	850	3.1011
3.1445	0.7535	900	3.0567
3.1211	0.7953	950	3.0514
3.0509	0.8372	1000	3.0533
3.024	0.8790	1050	2.9892
2.9578	0.9209	1100	2.9652
2.9466	0.9627	1150	2.9208
2.8977	1.0046	1200	2.9276
2.8674	1.0465	1250	2.8737
2.838	1.0883	1300	2.8679
2.8106	1.1302	1350	2.8425
2.7897	1.1720	1400	2.8235
2.7793	1.2139	1450	2.8163
2.7553	1.2558	1500	2.8196
2.7579	1.2976	1550	2.8118
2.7189	1.3395	1600	2.7977
2.7381	1.3813	1650	2.8012
2.738	1.4232	1700	2.7779
2.7029	1.4650	1750	2.7757
2.7094	1.5069	1800	2.7749
2.6883	1.5488	1850	2.7701
2.6682	1.5906	1900	2.7634
2.7208	1.6325	1950	2.7659
2.6934	1.6743	2000	2.7587
2.6738	1.7162	2050	2.7607
2.6813	1.7581	2100	2.7605
2.6845	1.7999	2150	2.7589
2.6511	1.8418	2200	2.7560
2.6599	1.8836	2250	2.7565
2.6527	1.9255	2300	2.7541
2.6451	1.9674	2350	2.7550