language: ca
datasets:
- projecte-aina/3catparla_asr
tags:
- audio
- automatic-speech-recognition
- catalan
- whisper-large-v3
- projecte-aina
- barcelona-supercomputing-center
- bsc
license: apache-2.0
model-index:
- name: whisper-large-v3-ca-3catparla
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: 3CatParla (Test)
type: projecte-aina/3catparla_asr
split: test
args:
language: ca
metrics:
- name: WER
type: wer
value: 0.96
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: 3CatParla (Dev)
type: projecte-aina/3catparla_asr
split: dev
args:
language: ca
metrics:
- name: WER
type: wer
value: 0.92
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Mozilla Common Voice 17.0 (Test)
type: mozilla-foundation/common_voice_17_0
split: test
args:
language: ca
metrics:
- name: WER
type: wer
value: 10.32
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Mozilla Common Voice 17.0 (Dev)
type: mozilla-foundation/common_voice_17_0
split: validation
args:
language: ca
metrics:
- name: WER
type: wer
value: 9.26
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Balearic fem)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Balearic female
args:
language: ca
metrics:
- name: WER
type: wer
value: 12.25
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Balearic male)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Balearic male
args:
language: ca
metrics:
- name: WER
type: wer
value: 12.18
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Central fem)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Central female
args:
language: ca
metrics:
- name: WER
type: wer
value: 8.51
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Central male)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Central male
args:
language: ca
metrics:
- name: WER
type: wer
value: 8.73
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Northern fem)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Northern female
args:
language: ca
metrics:
- name: WER
type: wer
value: 8.09
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Northern male)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Northern male
args:
language: ca
metrics:
- name: WER
type: wer
value: 8.28
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Northwestern fem)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Northwestern female
args:
language: ca
metrics:
- name: WER
type: wer
value: 7.88
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Northwestern male)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Northwestern male
args:
language: ca
metrics:
- name: WER
type: wer
value: 8.44
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Valencian fem)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Valencian female
args:
language: ca
metrics:
- name: WER
type: wer
value: 9.58
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Valencian male)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Valencian male
args:
language: ca
metrics:
- name: WER
type: wer
value: 9.1
library_name: transformers
whisper-large-v3-ca-3catparla
Table of Contents
Click to expand
Summary
The "whisper-large-v3-ca-3catparla" is an acoustic model based on "openai/whisper-large-v3" suitable for Automatic Speech Recognition in Catalan.
Model Description
The "whisper-large-v3-ca-3catparla" is an acoustic model suitable for Automatic Speech Recognition in Catalan. It is the result of finetuning the model "openai/whisper-large-v3" with 710 hours of Catalan data released by the Projecte AINA from Barcelona, Spain.
Intended Uses and Limitations
This model can used for Automatic Speech Recognition (ASR) in Catalan. The model is intended to transcribe audio files in Catalan to plain text without punctuation.
How to Get Started with the Model
To see an updated and functional version of this code, please see our our Notebook
Installation
In order to use this model, you may install datasets and transformers:
Create a virtual environment:
python -m venv /path/to/venv
Activate the environment:
source /path/to/venv/bin/activate
Install the modules:
pip install datasets transformers
For Inference
In order to transcribe audio in Catalan using this model, you can follow this example:
#Install Prerequisites
pip install torch
pip install datasets
pip install 'transformers[torch]'
pip install evaluate
pip install jiwer
#This code works with GPU
#Notice that: load_metric is no longer part of datasets.
#you have to remove it and use evaluate's load instead.
#(Note from November 2024)
import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor
#Load the processor and model.
MODEL_NAME="projecte-aina/whisper-large-v3-ca-3catparla"
processor = WhisperProcessor.from_pretrained(MODEL_NAME)
model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")
#Load the dataset
from datasets import load_dataset, load_metric, Audio
ds=load_dataset("projecte-aina/3catparla_asr",split='test')
#Downsample to 16kHz
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
#Process the dataset
def map_to_pred(batch):
audio = batch["audio"]
input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
batch["reference"] = processor.tokenizer._normalize(batch['normalized_text'])
with torch.no_grad():
predicted_ids = model.generate(input_features.to("cuda"))[0]
transcription = processor.decode(predicted_ids)
batch["prediction"] = processor.tokenizer._normalize(transcription)
return batch
#Do the evaluation
result = ds.map(map_to_pred)
#Compute the overall WER now.
from evaluate import load
wer = load("wer")
WER=100 * wer.compute(references=result["reference"], predictions=result["prediction"])
print(WER)
Test Result: 0.96
Training Details
Training data
The specific dataset used to create the model is called "3CatParla".
Training procedure
This model is the result of finetuning the model "openai/whisper-large-v3" by following this tutorial provided by Hugging Face.
Training Hyperparameters
- language: catalan
- hours of training audio: 710
- learning rate: 1.95e-07
- sample rate: 16000
- train batch size: 32 (x4 GPUs)
- gradient accumulation steps: 1
- eval batch size: 32
- save total limit: 3
- max steps: 19842
- warmup steps: 1984
- eval steps: 3307
- save steps: 3307
- shuffle buffer size: 480
Citation
If this model contributes to your research, please cite the work:
@misc{mena2024whisperlarge3catparla,
title={Acoustic Model in Catalan: whisper-large-v3-ca-3catparla.},
author={Hernandez Mena, Carlos Daniel; Armentano-Oller, Carme; Solito, Sarah; Külebi, Baybars},
organization={Barcelona Supercomputing Center},
url={https://huggingface.co/projecte-aina/whisper-large-v3-ca-3catparla},
year={2024}
}
Additional Information
Author
The fine-tuning process was perform during July (2024) in the Language Technologies Unit of the Barcelona Supercomputing Center by Carlos Daniel Hernández Mena.
Contact
For further information, please send an email to [email protected].
Copyright
Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.
License
Funding
This work has been promoted and financed by the Generalitat de Catalunya through the Aina project.
The training of the model was possible thanks to the compute time provided by Barcelona Supercomputing Center through MareNostrum 5.