Whisper Small GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-medium on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2028
  • Bleu: 33.77
  • Chrf: 52.79
  • Wer: 60.8285

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Bleu Chrf Wer
2.4145 0.0109 100 2.1019 2.32 16.08 170.5088
2.6073 0.0219 200 2.0370 5.94 23.77 158.8924
2.593 0.0328 300 1.8529 3.67 21.53 238.6312
2.3123 0.0438 400 1.8500 9.01 28.13 135.0293
2.3347 0.0547 500 1.7816 15.05 31.9 90.7249
2.1277 0.0657 600 1.6916 14.24 32.29 88.4286
2.1836 0.0766 700 1.6517 12.15 32.7 128.1405
2.0112 0.0876 800 1.6275 19.76 38.15 79.2886
1.8387 0.0985 900 1.6349 17.26 38.82 91.0851
1.8335 0.1095 1000 1.5843 20.93 38.02 75.9118
1.7849 0.1204 1100 1.5863 15.98 37.5 96.5781
1.5698 0.1314 1200 1.5371 16.42 39.07 103.6020
1.4759 0.1423 1300 1.5250 18.56 38.41 96.5781
1.4915 0.1533 1400 1.4862 22.05 40.15 75.1013
1.6583 0.1642 1500 1.4727 18.11 39.65 95.7677
1.3981 0.1752 1600 1.4367 27.31 44.5 66.0513
1.2646 0.1861 1700 1.4574 22.85 42.19 74.4710
1.2172 0.1970 1800 1.3818 20.77 42.5 82.7105
1.183 0.2080 1900 1.4380 22.75 41.28 76.7672
1.1931 0.2189 2000 1.3917 23.58 41.13 77.3075
1.172 0.2299 2100 1.3892 24.58 44.4 74.3809
1.0284 0.2408 2200 1.3806 23.34 44.1 78.0279
0.8507 0.2518 2300 1.3210 28.67 46.79 67.1769
0.9615 0.2627 2400 1.3103 27.95 46.8 70.0135
0.8049 0.2737 2500 1.3141 29.92 48.9 67.2220
0.7639 0.2846 2600 1.3085 30.91 49.05 64.2053
0.8594 0.2956 2700 1.3378 27.8 47.84 68.8879
0.7482 0.3065 2800 1.2978 30.6 48.62 64.9257
0.6941 0.3175 2900 1.3060 29.92 47.92 65.8712
0.7282 0.3284 3000 1.2959 31.09 48.13 65.3309
0.6298 0.3394 3100 1.2893 29.76 48.8 67.1769
0.619 0.3503 3200 1.2388 32.61 50.27 62.0891
0.6252 0.3612 3300 1.2550 32.71 50.96 62.4493
0.4699 0.3722 3400 1.2463 32.02 51.24 65.2409
0.5121 0.3831 3500 1.2214 32.26 51.29 63.7551
0.5092 0.3941 3600 1.2182 32.88 51.59 62.0891
0.4365 0.4050 3700 1.2049 32.16 51.5 62.3143
0.2971 0.4160 3800 1.2201 34.45 52.78 59.7479
0.389 0.4269 3900 1.2007 33.86 53.28 60.6033
0.3879 0.4379 4000 1.2028 33.77 52.79 60.8285

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
28
Safetensors
Model size
764M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ymoslem/whisper-medium-ga2en-v5.3.1-r

Finetuned
(499)
this model

Datasets used to train ymoslem/whisper-medium-ga2en-v5.3.1-r

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    33.770
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    60.828