Whisper Small Norwegian Bokmål

This model is a fine-tuned version of openai/whisper-small trained on NCC_S_3-NRKonly.

It is currently in the middle of a large training.

Model description

The model is trained on a large corpus of roughly 4.000 hours of voice. The sources are subtitles from the Norwegian broadcaster NRK.

Intended uses & limitations

The model will be free for everyone to use when it is finished.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-06
  • train_batch_size: 128
  • gradient_accumulation_steps: 2
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant with warmup
  • lr_scheduler_warmup_steps: 1000
  • training_steps: 50.000 (currently @1.000)
  • mixed_precision_training: fp16
  • deepspeed: true

Live Training results

See Tensorboad Metrics

Downloads last month
9
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train NbAiLab/whisper-small-3NRKonly-nob

Evaluation results