A newer version of this model is available:
suzii/vi-whisper-large-v3-turbo
Fine-tuned Whisper-V3-Turbo for Vietnamese ASR
This project involves fine-tuning the Whisper-V3-Turbo model to improve its performance for Automatic Speech Recognition (ASR) in the Vietnamese language. The model was trained for 240 hours using a single Nvidia A6000 GPU.
Data Sources
The training data comes from various Vietnamese speech corpora. Below is a list of datasets used for training:
- capleaf/viVoice
- NhutP/VSV-1100
- doof-ferb/fpt_fosd
- doof-ferb/infore1_25hours
- google/fleurs (vi_vn)
- doof-ferb/LSVSC
- quocanh34/viet_vlsp
- linhtran92/viet_youtube_asr_corpus_v2
- doof-ferb/infore2_audiobooks
- linhtran92/viet_bud500
Model
The model used in this project is the Whisper-V3-Turbo. Whisper is a multilingual ASR model trained on a large and diverse dataset. The version used here has been fine-tuned specifically for the Vietnamese language.
Training Configuration
- GPU Used: Nvidia A6000
- Training Time: 240 hours
- Wandb report
Usage
To use the fine-tuned model, follow the steps below:
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "suzii/vi-whisper-large-v3-turbo-v1"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
torch_dtype=torch_dtype,
device=device,
)
result = pipe("your-audio.mp3", return_timestamps=True)
Acknowledgements
This project would not be possible without the following datasets:
- Downloads last month
- 118
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for suzii/vi-whisper-large-v3-turbo-v1
Base model
openai/whisper-large-v3
Finetuned
openai/whisper-large-v3-turbo