|
--- |
|
language: ary |
|
metrics: |
|
- wer |
|
tags: |
|
- audio |
|
- automatic-speech-recognition |
|
- speech |
|
- xlsr-fine-tuning-week |
|
license: apache-2.0 |
|
model-index: |
|
- name: XLSR Wav2Vec2 Moroccan Arabic dialect by Boumehdi |
|
results: |
|
- task: |
|
name: Speech Recognition |
|
type: automatic-speech-recognition |
|
metrics: |
|
- name: Test WER |
|
type: wer |
|
value: 0.244673 |
|
--- |
|
# Wav2Vec2-Large-XLSR-53-Moroccan-Darija |
|
|
|
**wav2vec2-large-xlsr-53** fine-tuned on 27 hours (27 people) of labeled Darija Audios. |
|
|
|
# Old model vs new model |
|
|
|
<u>Old Model:</u> |
|
- The model contains numerous incorrect transcriptions as input |
|
- Multiple transcribers. |
|
- The audio database is not organized (by gender, age, regions ..). |
|
- Wrong wer rate |
|
|
|
<u>New Model:</u> |
|
- Transcriptions are now performed by a single individual. |
|
- Each hour of audio is pronounced by a different person. |
|
- Fine-tuning is ongoing 24/7 to enhance accuracy, and we are consistently adding more data to the model every day. |
|
- Audio database is more organized |
|
- True Wer rate |
|
|
|
<table><thead><tr><th><strong>Training Loss</strong></th> <th><strong>Validation</strong></th> <th><strong>Loss Wer</strong></th></tr></thead> <tbody><tr> |
|
<td>0.031600</td> |
|
<td>0.316006</td> |
|
<td>0.217313</td> |
|
</tr> </tbody></table> |
|
|
|
## Usage |
|
|
|
The model can be used directly as follows: |
|
|
|
```python |
|
import librosa |
|
import torch |
|
from transformers import Wav2Vec2CTCTokenizer, Wav2Vec2ForCTC, Wav2Vec2Processor, TrainingArguments, Wav2Vec2FeatureExtractor, Trainer |
|
|
|
tokenizer = Wav2Vec2CTCTokenizer("./vocab.json", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|") |
|
processor = Wav2Vec2Processor.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija', tokenizer=tokenizer) |
|
model=Wav2Vec2ForCTC.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija') |
|
|
|
|
|
# load the audio data (use your own wav file here!) |
|
input_audio, sr = librosa.load('file.wav', sr=16000) |
|
|
|
# tokenize |
|
input_values = processor(input_audio, return_tensors="pt", padding=True).input_values |
|
|
|
# retrieve logits |
|
logits = model(input_values).logits |
|
|
|
tokens = torch.argmax(logits, axis=-1) |
|
|
|
# decode using n-gram |
|
transcription = tokenizer.batch_decode(tokens) |
|
|
|
# print the output |
|
print(transcription) |
|
``` |
|
|
|
Output: قالت ليا هاد السيد هادا ما كاينش بحالو |
|
|
|
email: [email protected] |
|
|
|
BOUMEHDI Ahmed |
|
|