meta-llama/Llama-3.2-3B

This model is a fine-tuned version of meta-llama/Llama-3.2-3B on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 128
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training Loss	Epoch	Step	Validation Loss
0.7972	0.0275	200	0.8336
0.7579	0.0551	400	0.7996
0.8037	0.0826	600	0.7918
0.7333	0.1101	800	0.7879
0.7871	0.1376	1000	0.7818
0.8135	0.1652	1200	0.7736
0.7612	0.1927	1400	0.7699
0.7421	0.2202	1600	0.7643
0.7451	0.2478	1800	0.7595
0.7388	0.2753	2000	0.7556
0.7707	0.3028	2200	0.7523
0.7063	0.3303	2400	0.7481
0.8091	0.3579	2600	0.7440
0.764	0.3854	2800	0.7407
0.714	0.4129	3000	0.7370
0.6745	0.4405	3200	0.7339
0.6771	0.4680	3400	0.7295
0.7419	0.4955	3600	0.7257
0.71	0.5230	3800	0.7223
0.6362	0.5506	4000	0.7189
0.7616	0.5781	4200	0.7159
0.676	0.6056	4400	0.7126
0.6732	0.6332	4600	0.7094
0.7017	0.6607	4800	0.7067
0.6796	0.6882	5000	0.7038
0.7065	0.7157	5200	0.7012
0.6318	0.7433	5400	0.6987
0.639	0.7708	5600	0.6965
0.7078	0.7983	5800	0.6949
0.7029	0.8258	6000	0.6933
0.6977	0.8534	6200	0.6921
0.6803	0.8809	6400	0.6911
0.703	0.9084	6600	0.6905
0.6819	0.9360	6800	0.6901
0.6327	0.9635	7000	0.6899
0.6685	0.9910	7200	0.6899