File size: 2,110 Bytes
361f4f3 20a52e8 361f4f3 3128428 361f4f3 2c5ff3c eded877 ea3ec0d dc50ea5 1148e57 a2f4e64 9343b4d 5ee2a89 20a52e8 5ee2a89 37daef4 361f4f3 163eb69 361f4f3 3128428 361f4f3 f0b89f3 361f4f3 2e5002b a825c09 f0b89f3 361f4f3 bda656c dc50ea5 eded877 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
---
language: ary
metrics:
- wer
tags:
- audio
- automatic-speech-recognition
- speech
- xlsr-fine-tuning-week
license: apache-2.0
model-index:
- name: XLSR Wav2Vec2 Moroccan Arabic dialect by Boumehdi
results:
- task:
name: Speech Recognition
type: automatic-speech-recognition
metrics:
- name: Test WER
type: wer
value: 0.172139
---
# Wav2Vec2-Large-XLSR-53-Moroccan-Darija
**wav2vec2-large-xlsr-53**
- Fine-tuned on 31 hours (31 people) of labeled Darija Audios.
- Each hour of audio is pronounced by a different person.
- Transcriptions are performed by a single individual.
- Fine-tuning is ongoing 24/7 to enhance accuracy.
- We are consistently adding more data to the model every day.
- Audio database is organized (by sex, age, region, ..)
<table><thead><tr><th><strong>Training Loss</strong></th> <th><strong>Validation</strong></th> <th><strong>Loss Wer</strong></th></tr></thead> <tbody><tr>
<td>0.018800</td>
<td>0.262561</td>
<td>0.172139</td>
</tr> </tbody></table>
## Usage
The model can be used directly as follows:
```python
import librosa
import torch
from transformers import Wav2Vec2CTCTokenizer, Wav2Vec2ForCTC, Wav2Vec2Processor, TrainingArguments, Wav2Vec2FeatureExtractor, Trainer
tokenizer = Wav2Vec2CTCTokenizer("./vocab.json", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|")
processor = Wav2Vec2Processor.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija', tokenizer=tokenizer)
model=Wav2Vec2ForCTC.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija')
# load the audio data (use your own wav file here!)
input_audio, sr = librosa.load('file.wav', sr=16000)
# tokenize
input_values = processor(input_audio, return_tensors="pt", padding=True).input_values
# retrieve logits
logits = model(input_values).logits
tokens = torch.argmax(logits, axis=-1)
# decode using n-gram
transcription = tokenizer.batch_decode(tokens)
# print the output
print(transcription)
```
Output: قالت ليا هاد السيد هادا ما كاينش بحالو
email: [email protected]
BOUMEHDI Ahmed
|