boumehdi
/

wav2vec2-large-xlsr-moroccan-darija

Automatic Speech Recognition

Moroccan Arabic

xlsr-fine-tuning-week

Inference Endpoints

Model card Files Files and versions Community

boumehdi commited on Jan 8, 2024

Commit

6c8f71f

·

1 Parent(s): 9199901

Update README.md

Files changed (1) hide show

README.md +3 -6

README.md CHANGED Viewed

@@ -21,14 +21,11 @@ model-index:
 ---
 # Wav2Vec2-Large-XLSR-53-Moroccan-Darija
-**wav2vec2-large-xlsr-53**
-- Fine-tuned on 34 hours (34 people) of labeled Darija Audios.
-- Each hour of audio is pronounced by a different person.
-- Transcriptions are performed by a single individual.
 - Fine-tuning is ongoing 24/7 to enhance accuracy.
-- We are consistently adding more data to the model every day.
-- Audio database is organized (by sex, age, region, ..)
 <table><thead><tr><th><strong>Training Loss</strong></th> <th><strong>Validation</strong></th> <th><strong>Loss Wer</strong></th></tr></thead> <tbody><tr>
 <td>0.022800</td>

 ---
 # Wav2Vec2-Large-XLSR-53-Moroccan-Darija
+**wav2vec2-large-xlsr-53 new model**
+- Fine-tuned on 34 hours of labeled Darija Audios extracted from MDVC corpus. MDVC Corpus contains more than 1000 hours of Moroccan Darija "ary".
 - Fine-tuning is ongoing 24/7 to enhance accuracy.
+- We are consistently adding data to the model every day (We prefer not to add all MDVC Corpus at once as we are trying to standardize the way we write this language).
 <table><thead><tr><th><strong>Training Loss</strong></th> <th><strong>Validation</strong></th> <th><strong>Loss Wer</strong></th></tr></thead> <tbody><tr>
 <td>0.022800</td>