File size: 2,343 Bytes
361f4f3 ba806ae 361f4f3 3128428 361f4f3 eded877 8d4d84c eded877 55e7045 eded877 8d4d84c eded877 dc50ea5 eded877 dc50ea5 9343b4d 44abf99 37daef4 361f4f3 163eb69 361f4f3 3128428 361f4f3 f0b89f3 361f4f3 2e5002b a825c09 f0b89f3 361f4f3 bda656c dc50ea5 eded877 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
---
language: ary
metrics:
- wer
tags:
- audio
- automatic-speech-recognition
- speech
- xlsr-fine-tuning-week
license: apache-2.0
model-index:
- name: XLSR Wav2Vec2 Moroccan Arabic dialect by Boumehdi
results:
- task:
name: Speech Recognition
type: automatic-speech-recognition
metrics:
- name: Test WER
type: wer
value: 0.244673
---
# Wav2Vec2-Large-XLSR-53-Moroccan-Darija
**wav2vec2-large-xlsr-53** fine-tuned on 27 hours (27 people) of labeled Darija Audios.
# Old model vs new model
<u>Old Model:</u>
- The model contains numerous incorrect transcriptions as input
- Multiple transcribers.
- The audio database is not organized (by gender, age, regions ..).
- Wrong wer rate
<u>New Model:</u>
- Transcriptions are now performed by a single individual.
- Each hour of audio is pronounced by a different person.
- Fine-tuning is ongoing 24/7 to enhance accuracy, and we are consistently adding more data to the model every day.
- Audio database is more organized
- True Wer rate
<table><thead><tr><th><strong>Training Loss</strong></th> <th><strong>Validation</strong></th> <th><strong>Loss Wer</strong></th></tr></thead> <tbody><tr><td>0.057800</td> <td>0.297430</td> <td>0.244673</td></tr> </tbody></table>
## Usage
The model can be used directly as follows:
```python
import librosa
import torch
from transformers import Wav2Vec2CTCTokenizer, Wav2Vec2ForCTC, Wav2Vec2Processor, TrainingArguments, Wav2Vec2FeatureExtractor, Trainer
tokenizer = Wav2Vec2CTCTokenizer("./vocab.json", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|")
processor = Wav2Vec2Processor.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija', tokenizer=tokenizer)
model=Wav2Vec2ForCTC.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija')
# load the audio data (use your own wav file here!)
input_audio, sr = librosa.load('file.wav', sr=16000)
# tokenize
input_values = processor(input_audio, return_tensors="pt", padding=True).input_values
# retrieve logits
logits = model(input_values).logits
tokens = torch.argmax(logits, axis=-1)
# decode using n-gram
transcription = tokenizer.batch_decode(tokens)
# print the output
print(transcription)
```
Output: قالت ليا هاد السيد هادا ما كاينش بحالو
email: [email protected]
BOUMEHDI Ahmed
|