File size: 2,230 Bytes
6a7b451
04d5cc9
 
6a7b451
04d5cc9
6b62447
04d5cc9
6b62447
04d5cc9
6b62447
 
 
04d5cc9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
502591e
04d5cc9
 
502591e
6a7b451
04d5cc9
 
4e7823c
 
293f84b
 
4e7823c
defa005
867f019
defa005
c191ae1
defa005
867f019
defa005
04d5cc9
 
ee8c6ec
04d5cc9
867f019
75ab23d
 
 
867f019
 
 
 
ee8c6ec
867f019
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---

language:
- it
license: apache-2.0
tags:
- audio
- automatic-speech-recognition
- hf-asr-leaderboard
- it
- mozilla-foundation/common_voice_6_0
- speech
- xlsr-fine-tuning-week
datasets:
- mozilla-foundation/common_voice_8_0
model-index:
- name: XLS-R Wav2Vec2 Italian by radiogroup crits
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Common Voice 8
      type: mozilla-foundation/common_voice_8_0
      args: it
    metrics:
    - name: Test WER
      type: wer
      value: 9.04
    - name: Test CER
      type: cer
      value: 2.2
    - name: Test WER (+LM)
      type: wer
      value: 6.24
    - name: Test CER (+LM)
      type: cer
      value: 1.67
---

# XLS-R-1B-ITALIAN-DOC4LM-5GRAM

## Language model information

Our language model was generated using a dataset of Italian wikipedia articles and manual transcriptions about gr and television programs. 


## Download CommonVoice8.0 dataset for italian language
```python

from datasets import load_dataset



dataset = load_dataset("mozilla-foundation/common_voice_8_0", "it", use_auth_token=True)

```

## Evaluation Commands

To evaluate on `mozilla-foundation/common_voice_8_0` with split `test`:

```bash

python eval.py --model_id radiogroup-crits/wav2vec2-xls-r-1b-italian-doc4lm-5gram --dataset mozilla-foundation/common_voice_8_0 --config it --split test --log_outputs --greedy

mv log_mozilla-foundation_common_voice_8_0_it_test_predictions.txt log_mozilla-foundation_common_voice_8_0_it_test_predictions_greedy.txt

mv mozilla-foundation_common_voice_8_0_it_test_eval_results.txt mozilla-foundation_common_voice_8_0_it_test_eval_results_greedy.txt

```

## Citation
If you want to cite this model you can use this:

```bibtex

@misc{radiogroup-crits2022wav2vec2-xls-r-1b-italian-doc4lm-5gram,

  title={XLS-R Wav2Vec2 Italian by radiogroup-crits},

  author={Raffaele Teraoni Prioletti and Paolo Casagranda and Francesco Russo},

  publisher={Hugging Face},

  journal={Hugging Face Hub},

  howpublished={\url{https://huggingface.co/radiogroup-crits/wav2vec2-xls-r-1b-italian-doc4lm-5gram}},

  year={2022}

}

```