metadata
extra_gated_prompt: >-
This is a BETA-model. To use this model, you agree on the [licensing
terms](license.md).
language:
- 'no'
license: apache-2.0
tags:
- audio
- asr
- automatic-speech-recognition
- hf-asr-leaderboard
model-index:
- name: Small Scream - April Beta
results: []
Small Scream - April Beta
This model is a fine-tuned version of openai/whisper-small on the NbAiLab/NCC_speech_all_v5 dataset. It uses a Beam size of 5. It achieves the following results on the evaluation set:
- step: 49999
- eval_loss: 0.5299
- train_loss: 0.3369
- eval_wer: 11.9976
- eval_cer: 5.6236
Model description
This is a BETA version. You need to accept the terms and conditons to use it.
Using the Model
There are several ways of using this model, and we do hope people will convert it into different formats. The code below allows you to process long files with Transformers.:
import torch
import numpy as np
import librosa
from transformers import pipeline
# Try using "mps" for Metal (Mac), "cuda" if you have GPU, and "cpu" if not
device = torch.device("cuda")
pipe = pipeline("automatic-speech-recognition",
model="NbAiLab/small_scream_april_beta",
chunk_length_s=30,
device=device,
max_new_tokens=128,
generate_kwargs={"language": "", "task": "transcribe"})
# Load the WAV file. Modify this to use mp3 instead
audio_path = 'myfile.wav'
samples, sample_rate = librosa.load(audio_path, sr=16000, mono=True)
# Run the pipeline
prediction = pipe(samples)["text"]
print(prediction)
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 6e-06
- lr_scheduler_type: linear
- per_device_train_batch_size: 16
- total_train_batch_size_per_node: 64
- total_train_batch_size: 64
- total_optimization_steps: 50000
- starting_optimization_step: None
- finishing_optimization_step: 50000
- num_train_dataset_workers: 32
- total_num_training_examples: 3200000
Training results
step | eval_loss | train_loss | eval_wer | eval_cer |
---|---|---|---|---|
0 | 1.5034 | 1.6162 | 32.7040 | 11.9022 |
0 | 1.5034 | 1.6129 | 32.7040 | 11.9022 |
0 | 1.5034 | 1.5855 | 32.7040 | 11.9022 |
2500 | 0.9684 | 0.6679 | 21.7113 | 8.0826 |
5000 | 0.8986 | 0.6577 | 18.6358 | 7.3167 |
7500 | 0.7365 | 0.4619 | 16.3825 | 6.8027 |
10000 | 0.6429 | 0.4965 | 14.7990 | 6.2887 |
12500 | 0.6688 | 0.4602 | 13.7942 | 6.0217 |
15000 | 0.6509 | 0.4650 | 13.3069 | 5.9965 |
17500 | 0.5692 | 0.3979 | 12.8502 | 5.6790 |
20000 | 0.5530 | 0.3931 | 13.0938 | 5.8554 |
22500 | 0.5320 | 0.4441 | 12.5457 | 5.7596 |
25000 | 0.5109 | 0.4116 | 12.7893 | 5.8503 |
27500 | 0.4855 | 0.3728 | 12.9111 | 5.8856 |
30000 | 0.4720 | 0.3842 | 12.6066 | 5.8201 |
32500 | 0.4889 | 0.3051 | 12.4239 | 5.7244 |
35000 | 0.5312 | 0.3388 | 12.6066 | 5.9259 |
37500 | 0.5138 | 0.3409 | 12.3934 | 5.7999 |
40000 | 0.5214 | 0.2886 | 11.9367 | 5.5530 |
42500 | 0.5420 | 0.3431 | 12.6675 | 5.9914 |
45000 | 0.5263 | 0.4015 | 12.3934 | 5.9360 |
47500 | 0.5378 | 0.3218 | 12.1194 | 5.6185 |
49999 | 0.5299 | 0.3369 | 11.9976 | 5.6236 |
Framework versions
- Transformers 4.28.0.dev0
- Datasets 2.11.0
- Tokenizers 0.13.3