Whisper Small ATC - ATCText

This model is a fine-tuned version of openai/whisper-small on the ATC dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2486
  • Wer: 10.6129

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 4000

Training results

Training Loss Epoch Step Validation Loss Wer
0.2533 0.42 1000 0.3465 16.2868
0.235 0.84 2000 0.2881 13.5237
0.0851 1.27 3000 0.2607 10.6048
0.1317 1.69 4000 0.2486 10.6129

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.2.2
  • Datasets 2.18.0
  • Tokenizers 0.15.2

Additional Information

Licensing Information

The licensing status of the dataset hinges on the legal status of the UWB-ATCC corpus creators.

They used Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) licensing.

Citation Information

Contributors who prepared, processed, normalized and uploaded the dataset in HuggingFace:

@article{zuluaga2022how, title={How Does Pre-trained Wav2Vec2. 0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications}, author={Zuluaga-Gomez, Juan and Prasad, Amrutha and Nigmatulina, Iuliia and Sarfjoo, Saeed and others}, journal={IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar}, year={2022} }

@article{zuluaga2022bertraffic, title={BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications}, author={Zuluaga-Gomez, Juan and Sarfjoo, Seyyed Saeed and Prasad, Amrutha and others}, journal={IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar}, year={2022} }

@article{zuluaga2022atco2, title={ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications}, author={Zuluaga-Gomez, Juan and Vesel{`y}, Karel and Sz{"o}ke, Igor and Motlicek, Petr and others}, journal={arXiv preprint arXiv:2211.04054}, year={2022} }

Authors of the dataset:

@article{vsmidl2019air, title={Air traffic control communication (ATCC) speech corpora and their use for ASR and TTS development}, author={{\v{S}}m{'\i}dl, Lubo{\v{s}} and {\v{S}}vec, Jan and Tihelka, Daniel and Matou{\v{s}}ek, Jind{\v{r}}ich and Romportl, Jan and Ircing, Pavel}, journal={Language Resources and Evaluation}, volume={53}, number={3}, pages={449--464}, year={2019}, publisher={Springer} }

Downloads last month
21
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for san2003m/whisper-small-atc

Finetuned
(2103)
this model

Dataset used to train san2003m/whisper-small-atc

Evaluation results