Whisper base uz
This model is a fine-tuned version of openai/whisper-base based on the Common Voice dataset. It achieves the following results on the evaluation set:
- Loss: 0.1052
- Wer: 10.5982
Working for test audios
Model description
The jamshidahmadov/whisper-uz is a fine-tuned version of OpenAI's Whisper model, specifically optimized for Uzbek speech-to-text (STT) tasks. The model converts spoken Uzbek language into written text, making it useful for a variety of speech recognition applications, such as transcription, voice commands, and speech analytics. It performs well on audio recordings and can transcribe both clean and noisy speech, with a special focus on the unique phonetics and nuances of the Uzbek language.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 2000
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
0.1303 | 0.5714 | 500 | 0.1232 | 12.7454 |
0.0664 | 1.1429 | 1000 | 0.1115 | 11.2883 |
0.0742 | 1.7143 | 1500 | 0.1074 | 10.9356 |
0.0383 | 2.2857 | 2000 | 0.1052 | 10.5982 |
Framework versions
- Transformers 4.47.0
- Pytorch 2.4.0
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 6
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for jamshidahmadov/whisper-uz
Base model
openai/whisper-base