Whisper Large V3 Turbo (Swiss German Fine-Tuned with QLoRa)

This repository contains a fine-tuned version of OpenAI's Whisper Large V3 Turbo model, adapted specifically for Swiss German dialects using QLoRa optimization. The model achieves state-of-the-art performance for Swiss German automatic speech recognition (ASR).

Model Summary

  • Base Model: Whisper Large V3 Turbo
  • Fine-Tuning Method: QLoRa (8-bit precision)
    • Rank: 200
    • Alpha: 16
  • Hardware: 2x NVIDIA A100 80GB GPUs
  • Training Time: 140 hours

Performance Metrics

  • Word Error Rate (WER): 17.5%
  • BLEU Score: 65.0

The model's performance has been evaluated across multiple datasets representing diverse dialectal and demographic distributions in Swiss German.

Dataset Summary

The model has been trained and evaluated on a comprehensive suite of Swiss German datasets:

  1. SDS-200 Corpus

    • Size: 200 hours
    • Description: A corpus covering all Swiss German dialects.
  2. STT4SG-350

    • Size: 343 hours
    • Description: Balanced distribution across Swiss German dialects and demographics, including gender representation.
    • Dataset Link
  3. SwissDial-Zh v1.1

    • Size: 24 hours
    • Description: A dataset with balanced representation of Swiss German dialects.
    • Dataset Link
  4. Swiss Parliament Corpus V2 (SPC)

    • Size: 293 hours
    • Description: Parliament recordings across Swiss German dialects.
    • Dataset Link
  5. ASGDTS (All Swiss German Dialects Test Set)

    • Size: 13 hours
    • Description: A stratified dataset closely resembling real-world Swiss German dialect distribution.
    • Dataset Link

Results Across Datasets

WER Scores

Model WER (All) WER SD (All)
Turbo V3 Swiss German 0.1672 0.1754
Large V3 0.2884 0.2829
Turbo V3 0.4392 0.2777

BLEU Scores

Model BLEU (All) BLEU SD (All)
Turbo V3 Swiss German 0.65 0.3149
Large V3 0.5345 0.3453
Turbo V3 0.3367 0.2975

Visual Results

WER and BLEU Scores Across Datasets

General Results

WER Scores Across Datasets

WER Scores

BLEU Scores Across Datasets

BLEU Scores

Usage

This model can be used directly with the Hugging Face Transformers library for tasks requiring Swiss German ASR.

Acknowledgments

Special thanks to the creators and maintainers of the datasets used in this work:

And to the University of Geneva for allowing us access to their High Performance Computing cluster on which the model has been trained.

Citation

If you use this model in your work, please cite this repository as follows:

@misc{whisper-large-v3-turbo-swissgerman,
  author = {Nizar Michaud},
  title = {Whisper Large V3 Turbo Fine-Tuned for Swiss German},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/nizarmichaud/whisper-large-v3-turbo-swissgerman},
  doi = 10.57967/hf/3858,
}
Downloads last month
131
Safetensors
Model size
809M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for nizarmichaud/whisper-large-v3-turbo-swissgerman

Finetuned
(128)
this model