Whisper Small GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-small on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia datasets. The datasets are augmented in two ways: noise augmentation, and truncating low-amplitude samples. The best model checkpoint (this version) based on ChrF is at step 2800, epoch 1.2259, and it achieves the following results on the evaluation set:

  • Loss: 1.3547
  • Bleu: 32.57
  • Chrf: 47.04
  • Wer: 62.0891

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Hardware

1 NVIDIA A100-SXM4-80GB

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 0
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • generation_max_length: 225

Training results

Training Loss Epoch Step Validation Loss Bleu Chrf Wer
2.3533 0.0438 100 1.7789 6.29 25.08 148.7618
1.9035 0.0876 200 1.5122 18.21 34.02 85.6821
1.5357 0.1313 300 1.3983 14.01 33.7 93.3363
1.3056 0.1751 400 1.3447 18.12 37.35 95.0023
1.1177 0.2189 500 1.3168 18.47 38.44 95.3624
0.984 0.2627 600 1.3202 26.82 41.23 67.3120
0.8945 0.3065 700 1.2947 26.73 42.53 67.1319
0.7508 0.3503 800 1.2476 25.67 42.06 74.2008
0.7127 0.3940 900 1.2630 22.59 41.05 75.7767
0.5944 0.4378 1000 1.2726 22.37 40.31 82.4854
0.4972 0.4816 1100 1.2898 22.88 42.52 82.5304
0.4517 0.5254 1200 1.2509 27.99 44.42 64.1603
0.3885 0.5692 1300 1.2887 29.58 44.8 63.1247
0.3337 0.6130 1400 1.2645 30.05 45.5 62.6294
0.2852 0.6567 1500 1.2972 28.2 43.57 68.6628
0.2583 0.7005 1600 1.2716 28.21 45.06 73.6155
0.2016 0.7443 1700 1.3346 27.55 43.21 74.3809
0.1883 0.7881 1800 1.3124 21.45 41.83 94.1018
0.1514 0.8319 1900 1.3178 28.2 44.09 63.7551
0.1311 0.8757 2000 1.3246 27.33 43.25 74.3359
0.1128 0.9194 2100 1.3464 25.21 42.93 83.2508
0.0994 0.9632 2200 1.3315 30.51 45.74 64.7456
0.0512 1.0070 2300 1.3377 30.89 46.44 63.3498
0.0447 1.0508 2400 1.3587 28.72 44.36 64.3404
0.0368 1.0946 2500 1.3619 31.53 46.56 61.9541
0.0281 1.1384 2600 1.3596 30.98 46.45 70.4638
0.0273 1.1821 2700 1.3656 32.09 46.85 62.1792
0.0287 1.2259 2800 1.3547 32.57 47.04 62.0891
0.025 1.2697 2900 1.3539 26.94 45.43 81.1796
0.0263 1.3135 3000 1.3512 30.11 46.73 71.4993

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
21
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ymoslem/whisper-small-ga2en-v5.2.1-r

Finetuned
(2102)
this model

Datasets used to train ymoslem/whisper-small-ga2en-v5.2.1-r

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia + augmented
    self-reported
    30.110
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia + augmented
    self-reported
    71.499