Whisper Small GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-small on the IWSLT-2023, FLEURS, BiteSize, and SpokenWords datasets. The best model checkpoint (this version) based on ChrF is at step 2100, epoch 4.5259, and it achieves the following results on the evaluation set:

  • Loss: 1.7200
  • Bleu: 29.83
  • Chrf: 44.87
  • Wer: 64.8807

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

Training: IWSLT-2023 (train+dev), FLEURS, BiteSize, and SpokenWords Evaluation: IWSLT-2023 (test)

Training procedure

Hardware:

1 NVIDIA A100-SXM4-80GB

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 0
  • training_steps: 3000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Bleu Chrf Wer
1.9416 0.2155 100 1.7899 13.09 26.48 104.4575
1.5186 0.4310 200 1.5696 18.6 35.75 87.5732
1.2884 0.6466 300 1.4751 17.57 37.19 87.2580
1.0729 0.8621 400 1.4345 17.92 38.23 99.2346
0.4574 1.0776 500 1.5585 22.48 39.17 83.1607
0.4517 1.2931 600 1.5763 22.53 38.38 81.7650
0.4385 1.5086 700 1.5852 20.05 39.46 96.8483
0.3934 1.7241 800 1.5332 26.89 42.67 70.6889
0.3587 1.9397 900 1.5025 28.95 44.16 64.9707
0.1528 2.1552 1000 1.5882 28.32 42.36 65.8712
0.1425 2.3707 1100 1.6056 25.5 42.42 75.0113
0.1389 2.5862 1200 1.6236 26.52 42.11 70.6439
0.1532 2.8017 1300 1.6196 25.78 41.61 75.9118
0.1138 3.0172 1400 1.7185 26.01 40.88 69.6983
0.0661 3.2328 1500 1.6626 28.74 43.16 71.2292
0.0625 3.4483 1600 1.6835 29.16 43.6 66.3215
0.0615 3.6638 1700 1.6756 28.93 44.08 68.3476
0.0611 3.8793 1800 1.6648 27.77 43.67 72.1747
0.0344 4.0948 1900 1.7351 28.33 44.18 68.1225
0.0339 4.3103 2000 1.7715 28.9 42.98 67.0869
0.0369 4.5259 2100 1.7200 29.83 44.87 64.8807
0.0326 4.7414 2200 1.7232 28.23 43.75 69.3832
0.0346 4.9569 2300 1.7688 27.72 43.1 72.8050
0.0167 5.1724 2400 1.8072 28.73 43.26 67.4471
0.0146 5.3879 2500 1.7801 29.91 44.24 66.4566
0.0165 5.6034 2600 1.7782 29.34 44.33 68.2125
0.0143 5.8190 2700 1.7675 27.78 43.07 72.5799
0.0106 6.0345 2800 1.7660 29.45 43.31 67.5371
0.0098 6.25 2900 1.7803 27.89 42.67 71.6344
0.0087 6.4655 3000 1.7786 27.66 43.04 72.0396

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
18
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ymoslem/whisper-small-ga2en-v1.2-r

Finetuned
(2103)
this model

Datasets used to train ymoslem/whisper-small-ga2en-v1.2-r

Collection including ymoslem/whisper-small-ga2en-v1.2-r

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, and SpokenWords
    self-reported
    27.660
  • Wer on IWSLT-2023, FLEURS, BiteSize, and SpokenWords
    self-reported
    72.040