MMS speech recognition for Ugandan languages

This is a fine-tuned version of facebook/mms-1b-all for Ugandan languages, trained with the SALT dataset. The languages supported are:

code language
lug Luganda
ach Acholi
lgg Lugbara
teo Ateso
nyn Runyankole
eng English (Ugandan)

For each language there are two adapters: one optimised for cases where the speech is only in that language, and another in which code-switching with English is expected.

Usage

Usage is the same as the base model, though with different adapters available.

import torch
import transformers
import datasets

# Available adapters:
# ['lug', 'lug+eng', 'ach', 'ach+eng', 'lgg', 'lgg+eng',
#  'nyn', 'nyn+eng', 'teo', 'teo+eng']
language = 'lug'

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = transformers.Wav2Vec2ForCTC.from_pretrained(
    'Sunbird/asr-mms-salt').to(device)
model.load_adapter(language)

processor = transformers.Wav2Vec2Processor.from_pretrained(
    'Sunbird/asr-mms-salt')
processor.tokenizer.set_target_lang(language)

# Get some test audio
ds = datasets.load_dataset('Sunbird/salt', 'multispeaker-lug', split='test')
audio = ds[0]['audio']
sample_rate = ds[0]['sample_rate']

# Apply the model
inputs = processor(audio, sampling_rate=sample_rate, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs.to(device)).logits

ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)

print(transcription)
# ekikola ky'akasooli kyakyenvu wabula langi yakyo etera okuba eyaakitaka wansi

The output of this model is unpunctuated and lower case. For applications requiring formatted text, an alternative model is Sunbird/asr-whisper-large-v2-salt.

Downloads last month
74,586
Inference API
or

Model tree for Sunbird/asr-mms-salt

Finetuned
(214)
this model

Dataset used to train Sunbird/asr-mms-salt

Space using Sunbird/asr-mms-salt 1