mms-1b-swagen-baseline-model

This model is a fine-tuned version of facebook/mms-1b-all on the SWAGEN - SWA dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 30.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Wer
15.3971	0.2387	100	3.5940	1.0051
6.4342	0.4773	200	2.9924	0.9869
3.4197	0.7160	300	0.2737	0.2023
0.56	0.9547	400	0.2543	0.1962
0.5187	1.1933	500	0.2420	0.1929
0.5115	1.4320	600	0.2393	0.1947
0.5086	1.6706	700	0.2360	0.1892
0.4801	1.9093	800	0.2333	0.1874
0.5281	2.1480	900	0.2355	0.1958
0.4683	2.3866	1000	0.2378	0.1956
0.4548	2.6253	1100	0.2283	0.1874
0.4654	2.8640	1200	0.2323	0.1892
0.453	3.1026	1300	0.2288	0.1898
0.4542	3.3413	1400	0.2303	0.1902
0.4621	3.5800	1500	0.2253	0.1865
0.4342	3.8186	1600	0.2267	0.1869
0.466	4.0573	1700	0.2284	0.1898
0.4268	4.2959	1800	0.2325	0.1958
0.4283	4.5346	1900	0.2250	0.1886
0.4407	4.7733	2000	0.2250	0.1884
0.4762	5.0119	2100	0.2277	0.1894
0.4289	5.2506	2200	0.2225	0.1872
0.4391	5.4893	2300	0.2229	0.1884
0.4333	5.7279	2400	0.2229	0.1878
0.4351	5.9666	2500	0.2279	0.1902
0.4065	6.2053	2600	0.2280	0.1939