ArTST-V2 (ASR task)

ArTST model finetuned for automatic speech recognition (speech-to-text) on QASR to improve dialectal generalization.

Model Description

  • Developed by: Speech Lab, MBZUAI
  • Model type: SpeechT5
  • Language: Arabic
  • Finetuned from: ArTST-v2 pretrained

How to Get Started with the Model

import soundfile as sf
from transformers import (
    SpeechT5Config,
    SpeechT5FeatureExtractor,
    SpeechT5ForSpeechToText,
    SpeechT5Processor,
    SpeechT5Tokenizer,
)

from custom_tokenizer import CustomTextTokenizer

device = "cuda" if torch.cuda.is_available() else "CPU"

model_id = "mbzuai/artst-v2-asr"

tokenizer = SpeechT5Tokenizer.from_pretrained(model_id)
processor = SpeechT5Processor.from_pretrained(model_id , tokenizer=tokenizer)
model = SpeechT5ForSpeechToText.from_pretrained(model_id).to(device)

audio, sr = sf.read("audio.wav")

inputs = processor(audio=audio, sampling_rate=sr, return_tensors="pt")
predicted_ids = model.generate(**inputs.to(device), max_length=150)

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])

Model Sources [optional]

Citation [optional]

BibTeX:

@inproceedings{toyin-etal-2023-artst,
    title = "{A}r{TST}: {A}rabic Text and Speech Transformer",
    author = "Toyin, Hawau  and
      Djanibekov, Amirbek  and
      Kulkarni, Ajinkya  and
      Aldarmaki, Hanan",
    booktitle = "Proceedings of ArabicNLP 2023",
    month = dec,
    year = "2023",
    address = "Singapore (Hybrid)",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.arabicnlp-1.5",
    doi = "10.18653/v1/2023.arabicnlp-1.5",
    pages = "41--51",
}
Downloads last month
273
Safetensors
Model size
155M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including MBZUAI/artst-v2-asr