Aditya3107's picture
Update README.md
1d170b6
metadata
language: ga
datasets:
  - common_voice
  - living-audio-Irish
metrics:
  - wer
tags:
  - audio
  - automatic-speech-recognition
  - ga-IE
  - speech
  - Irish
  - Gaelic
model-index:
  - name: Wav2vec 2.0 large 300m XLS-R
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 10.0
          type: common_voice
          args: ga-IE
        metrics:
          - name: Test WER
            type: wer
            value: 25.94

Irish-Gaelic Automatic Speech Recognition

This is the model for Irish ASR. It has been trained on the Common-voice dataset and living Irish audio dataset. The Common-voice code for the Irish language is ga-IE. From the Common voice dataset, all the Validated audio clips and all the living audio clips were taken into account and after a random train-test split, 90% of the total dataset (5156 utterances) were taken for training, and the rest of the 10% of real data (579 utterances) were taken for testing.

This dataset was finetuned on wav2vec2-large-xls-r-300m. On the testing dataset, 25.94% of WER could be achieved.

How to use

Example of transcribing the Common Voice audio clip from the invalidated dataset, using GPU if available. The model expects 16kHz audio.

from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

model = Wav2Vec2ForCTC.from_pretrained("Aditya3107/wav2vec2-large-xls-r-1b-ga-ie")
processor = Wav2Vec2Processor.from_pretrained("Aditya3107/wav2vec2-large-xls-r-1b-ga-ie")

# Reading taken audio clip
import librosa, torch
audio, rate = librosa.load("common-voice-irish/common_voice/cv-corpus-10.0-2022-07-04/ga-IE/clips/common_voice_ga-IE_1818627.mp3", sr = 16000)

# Taking an input value
input_values = processor(audio, sampling_rate=16_000, return_tensors = "pt", padding="longest").input_values
# Storing logits (non-normalized prediction values)
logits = model(input_values).logits
# Storing predicted ids
prediction = torch.argmax(logits, dim = -1)

# Passing the prediction to the tokenizer decode to get the transcription
transcription = processor.batch_decode(prediction)[0]
print(transcription)

Results

Example of the transcribed audio clips and testing on SCLITE.

Speaker sentences   0:     #utts: 1
           
id: (common_voice_ga-IE_17401296.mp3)
Scores: (#C #S #D #I) 4 1 0 0
Attributes: Case_sensitve 
REF:  an bhfuil cóta bán óir 
HYP:  an bhfuil cóta bán air  
Eval:                      S    

id: (common_voice_ga-IE_17410244.mp3)
Scores: (#C #S #D #I) 3 1 0 2
Attributes: Case_sensitve 
REF:  *** ** an bud é sin 
HYP:  cad é an rud é sin 
Eval: I   I     S          

id: (common_voice_ga-IE_17410257.mp3)
Scores: (#C #S #D #I) 9 2 1 2
Attributes: Case_sensitve 
REF:  i gabhaim buíochas libh a chairde ******* ** támindéagtstruth le tuilleadh uaibh ar baá 
HYP:  * gabhaim buíochas libh a chairde táimid ag tsnúth            le tuilleadh uaibh ar ball 
Eval: D                                  I       I  S                                        S    

id: (common_voice_ga-IE_17410401.mp3)
Scores: (#C #S #D #I) 6 1 0 0
Attributes: Case_sensitve 
REF:  níl ach tá peann ina phóca uige 
HYP:  níl ach tá peann ina phóca aige 
Eval:                               S    

id: (common_voice_ga-IE_17410403.mp3)
Scores: (#C #S #D #I) 5 1 0 1
Attributes: Case_sensitve 
REF:  agus *** cadé an dath atá air 
HYP:  agus cad é    an dath atá air 
Eval:      I   S                      

id: (common_voice_ga-IE_17410412.mp3)
Scores: (#C #S #D #I) 6 2 0 0
Attributes: Case_sensitve 
REF:  is lá é seo chun ceiliúradh  a dhéan    
HYP:  is lá é seo chun céiliúradh a dhéanamh 
Eval:                    S              S         

id: (common_voice_ga-IE_17444712.mp3)
Scores: (#C #S #D #I) 4 6 0 0
Attributes: Case_sensitve 
REF:  don chathaoileach  mirín   de brom  don stiúrdhóirat liam ón maoladha  
HYP:  don chathaoirleach máirín de brún don stiúrthóir   liam ó  maolaodha 
Eval:     S              S           S         S                   S   S         

id: (common_voice_ga-IE_17449454.mp3)
Scores: (#C #S #D #I) 4 0 0 0
Attributes: Case_sensitve 
REF:  ceacht a trí déag 
HYP:  ceacht a trí déag 
Eval:                     

Future Tasks

The language model with KenLM will be added if any good resource of Irish text is found.

Citation

If you want to cite this model you can use this:

@MISC {,
    author       = "Aditya Parikh",
    title        = "Finetuned XLS-R model for Irish (Ga-IE) language for Automatic Speech Recognition",
    howpublished = "{\url{https://huggingface.co/Aditya3107/wav2vec2-large-xls-r-1b-ga-ie}}",
    month        = "aug",
    year         = "2022"
}