Whisper Large GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-large on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia dataset. The datasets are augmented in two ways: noise augmentation, and truncating low-amplitude samples. The best model checkpoint (this version) based on ChrF is at step 3000, epoch 0.99, and it achieves the following results on the evaluation set:

Loss: 1.1742
Bleu: 30.16
Chrf: 50.72
Wer: 69.9685

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 0.03
training_steps: 3000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Chrf	Wer
3.1833	0.03	100	2.5169	2.03	16.8	215.5786
2.7632	0.07	200	2.1827	7.81	24.07	113.1022
2.5687	0.1	300	2.0746	6.16	24.2	158.8474
2.5615	0.13	400	1.9379	8.68	26.18	120.8465
2.4554	0.16	500	1.8932	12.14	28.94	103.1067
2.3546	0.2	600	1.8734	14.34	29.83	91.5353
2.2804	0.23	700	1.8075	13.18	33.07	105.5380
2.1408	0.26	800	1.7034	13.01	33.0	89.4642
2.0411	0.3	900	1.6556	16.73	34.97	91.4453
1.7766	0.33	1000	1.6505	17.21	35.54	83.5209
1.7704	0.36	1100	1.5800	17.54	38.11	77.1724
1.6537	0.39	1200	1.5684	14.2	35.39	95.6326
1.4841	0.43	1300	1.4970	22.96	39.35	71.3643
1.641	0.46	1400	1.4693	16.3	37.69	103.7821
1.393	0.49	1500	1.3923	27.21	43.87	69.3381
1.249	0.53	1600	1.3876	23.33	42.26	76.5421
1.3385	0.56	1700	1.3404	23.86	42.82	75.0563
1.2537	0.59	1800	1.3226	17.03	41.72	100.1801
1.2891	0.62	1900	1.2995	27.26	43.62	69.1580
1.226	0.66	2000	1.2605	30.89	47.34	63.5750
1.1268	0.69	2100	1.2783	27.43	45.97	67.4921
1.0007	0.72	2200	1.2521	27.21	47.25	71.0041
0.9565	0.76	2300	1.2219	31.65	48.07	64.2053
0.9309	0.79	2400	1.2193	31.4	48.18	64.1603
0.7923	0.82	2500	1.2099	28.88	48.89	69.7884
0.8199	0.85	2600	1.1972	29.37	48.07	67.3120
0.6974	0.89	2700	1.1857	29.7	48.95	70.5988
0.6736	0.92	2800	1.1884	29.33	48.97	72.7150
0.6826	0.95	2900	1.1834	30.76	50.11	68.1225
0.7001	0.99	3000	1.1742	30.16	50.72	69.9685

Framework versions

Transformers 4.39.3
Pytorch 2.0.1+cu118
Datasets 2.18.0
Tokenizers 0.15.2

ymoslem
/

whisper-large-ga2en-v1.1.1

Whisper Large GA-EN Speech Translation

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ymoslem/whisper-large-ga2en-v1.1.1

Datasets used to train ymoslem/whisper-large-ga2en-v1.1.1

Collection including ymoslem/whisper-large-ga2en-v1.1.1

Speech Translation (Irish-English)

Evaluation results