File size: 2,230 Bytes
6a7b451 04d5cc9 6a7b451 04d5cc9 6b62447 04d5cc9 6b62447 04d5cc9 6b62447 04d5cc9 502591e 04d5cc9 502591e 6a7b451 04d5cc9 4e7823c 293f84b 4e7823c defa005 867f019 defa005 c191ae1 defa005 867f019 defa005 04d5cc9 ee8c6ec 04d5cc9 867f019 75ab23d 867f019 ee8c6ec 867f019 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
---
language:
- it
license: apache-2.0
tags:
- audio
- automatic-speech-recognition
- hf-asr-leaderboard
- it
- mozilla-foundation/common_voice_6_0
- speech
- xlsr-fine-tuning-week
datasets:
- mozilla-foundation/common_voice_8_0
model-index:
- name: XLS-R Wav2Vec2 Italian by radiogroup crits
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice 8
type: mozilla-foundation/common_voice_8_0
args: it
metrics:
- name: Test WER
type: wer
value: 9.04
- name: Test CER
type: cer
value: 2.2
- name: Test WER (+LM)
type: wer
value: 6.24
- name: Test CER (+LM)
type: cer
value: 1.67
---
# XLS-R-1B-ITALIAN-DOC4LM-5GRAM
## Language model information
Our language model was generated using a dataset of Italian wikipedia articles and manual transcriptions about gr and television programs.
## Download CommonVoice8.0 dataset for italian language
```python
from datasets import load_dataset
dataset = load_dataset("mozilla-foundation/common_voice_8_0", "it", use_auth_token=True)
```
## Evaluation Commands
To evaluate on `mozilla-foundation/common_voice_8_0` with split `test`:
```bash
python eval.py --model_id radiogroup-crits/wav2vec2-xls-r-1b-italian-doc4lm-5gram --dataset mozilla-foundation/common_voice_8_0 --config it --split test --log_outputs --greedy
mv log_mozilla-foundation_common_voice_8_0_it_test_predictions.txt log_mozilla-foundation_common_voice_8_0_it_test_predictions_greedy.txt
mv mozilla-foundation_common_voice_8_0_it_test_eval_results.txt mozilla-foundation_common_voice_8_0_it_test_eval_results_greedy.txt
```
## Citation
If you want to cite this model you can use this:
```bibtex
@misc{radiogroup-crits2022wav2vec2-xls-r-1b-italian-doc4lm-5gram,
title={XLS-R Wav2Vec2 Italian by radiogroup-crits},
author={Raffaele Teraoni Prioletti and Paolo Casagranda and Francesco Russo},
publisher={Hugging Face},
journal={Hugging Face Hub},
howpublished={\url{https://huggingface.co/radiogroup-crits/wav2vec2-xls-r-1b-italian-doc4lm-5gram}},
year={2022}
}
``` |