|
--- |
|
language: |
|
- it |
|
license: apache-2.0 |
|
datasets: |
|
- common_voice |
|
- mozilla-foundation/common_voice_8_0 |
|
metrics: |
|
- wer |
|
- cer |
|
tags: |
|
- audio |
|
- automatic-speech-recognition |
|
- hf-asr-leaderboard |
|
- it |
|
- mozilla-foundation/common_voice_8_0 |
|
- speech |
|
model-index: |
|
- name: XLS-R Wav2Vec2 Italian by radiogroup crits |
|
results: |
|
- task: |
|
name: Automatic Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: Common Voice 8 |
|
type: mozilla-foundation/common_voice_8_0 |
|
args: it |
|
metrics: |
|
- name: Test WER |
|
type: wer |
|
value: 9.04 |
|
- name: Test CER |
|
type: cer |
|
value: 2.2 |
|
- name: Test WER (+LM) |
|
type: wer |
|
value: 6.24 |
|
- name: Test CER (+LM) |
|
type: cer |
|
value: 1.67 |
|
--- |
|
# XLS-R-1B-ITALIAN-DOC4LM-5GRAM |
|
|
|
## Language model information |
|
|
|
Our language model was generated using a dataset of Italian wikipedia articles and manual transcriptions about gr and television programs. |
|
|
|
|
|
## Download CommonVoice8.0 dataset for italian language |
|
```python |
|
from datasets import load_dataset |
|
|
|
dataset = load_dataset("mozilla-foundation/common_voice_8_0", "it", use_auth_token=True) |
|
``` |
|
|
|
## Evaluation Commands |
|
|
|
To evaluate on `mozilla-foundation/common_voice_8_0` with split `test`: |
|
|
|
```bash |
|
python eval.py --model_id radiogroup-crits/wav2vec2-xls-r-1b-italian-doc4lm-5gram --dataset mozilla-foundation/common_voice_8_0 --config it --split test --log_outputs --greedy |
|
mv log_mozilla-foundation_common_voice_8_0_it_test_predictions.txt log_mozilla-foundation_common_voice_8_0_it_test_predictions_greedy.txt |
|
mv mozilla-foundation_common_voice_8_0_it_test_eval_results.txt mozilla-foundation_common_voice_8_0_it_test_eval_results_greedy.txt |
|
``` |
|
|
|
## Citation |
|
If you want to cite this model you can use this: |
|
|
|
```bibtex |
|
@misc{radiogroup-crits2022wav2vec2-xls-r-1b-italian-doc4lm-5gram, |
|
title={XLS-R Wav2Vec2 Italian by radiogroup-crits}, |
|
author={Raffaele Teraoni Prioletti, Paolo Casagranda and Francesco Russo}, |
|
publisher={Hugging Face}, |
|
journal={Hugging Face Hub}, |
|
howpublished={\url{https://huggingface.co/radiogroup-crits/wav2vec2-xls-r-1b-italian-doc4lm-5gram}}, |
|
year={2022} |
|
} |
|
``` |