radiogroup-crits commited on
Commit
b63099b
1 Parent(s): 63e7175

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -77
README.md CHANGED
@@ -1,78 +1,78 @@
1
- ---
2
- language:
3
- - it
4
- license: apache-2.0
5
- datasets:
6
- - common_voice
7
- - mozilla-foundation/common_voice_8_0
8
- metrics:
9
- - wer
10
- - cer
11
- tags:
12
- - audio
13
- - automatic-speech-recognition
14
- - hf-asr-leaderboard
15
- - it
16
- - mozilla-foundation/common_voice_8_0
17
- - speech
18
- model-index:
19
- - name: XLS-R Wav2Vec2 Italian by radiogroup crits
20
- results:
21
- - task:
22
- name: Automatic Speech Recognition
23
- type: automatic-speech-recognition
24
- dataset:
25
- name: Common Voice 8
26
- type: mozilla-foundation/common_voice_8_0
27
- args: it
28
- metrics:
29
- - name: Test WER
30
- type: wer
31
- value: 9.04
32
- - name: Test CER
33
- type: cer
34
- value: 2.2
35
- - name: Test WER (+LM)
36
- type: wer
37
- value: 6.24
38
- - name: Test CER (+LM)
39
- type: cer
40
- value: 1.67
41
- ---
42
- # XLS-R-1B-ITALIAN-DOC4LM-5GRAM
43
-
44
- ## Language model information
45
-
46
- Our language model was generated using a dataset of Italian wikipedia articles and manual transcriptions about gr and television programs.
47
-
48
-
49
- ## Download CommonVoice8.0 dataset for italian language
50
- ```python
51
- from datasets import load_dataset
52
-
53
- dataset = load_dataset("mozilla-foundation/common_voice_8_0", "it", use_auth_token=True)
54
- ```
55
-
56
- ## Evaluation Commands
57
-
58
- To evaluate on `mozilla-foundation/common_voice_8_0` with split `test`:
59
-
60
- ```bash
61
- python eval.py --model_id radiogroup-crits/wav2vec2-xls-r-1b-italian-doc4lm-5gram --dataset mozilla-foundation/common_voice_8_0 --config it --split test --log_outputs --greedy
62
- mv log_mozilla-foundation_common_voice_8_0_it_test_predictions.txt log_mozilla-foundation_common_voice_8_0_it_test_predictions_greedy.txt
63
- mv mozilla-foundation_common_voice_8_0_it_test_eval_results.txt mozilla-foundation_common_voice_8_0_it_test_eval_results_greedy.txt
64
- ```
65
-
66
- ## Citation
67
- If you want to cite this model you can use this:
68
-
69
- ```bibtex
70
- @misc{radiogroup-crits2022wav2vec2-xls-r-1b-italian-doc4lm-5gram,
71
- title={XLS-R Wav2Vec2 Italian by radiogroup-crits},
72
- author={Raffaele Teraoni Prioletti and Paolo Casagranda and Francesco Russo},
73
- publisher={Hugging Face},
74
- journal={Hugging Face Hub},
75
- howpublished={\url{https://huggingface.co/radiogroup-crits/wav2vec2-xls-r-1b-italian-doc4lm-5gram}},
76
- year={2022}
77
- }
78
  ```
 
1
+ ---
2
+ language:
3
+ - it
4
+ license: apache-2.0
5
+ datasets:
6
+ - common_voice
7
+ - mozilla-foundation/common_voice_8_0
8
+ metrics:
9
+ - wer
10
+ - cer
11
+ tags:
12
+ - audio
13
+ - automatic-speech-recognition
14
+ - hf-asr-leaderboard
15
+ - it
16
+ - mozilla-foundation/common_voice_8_0
17
+ - speech
18
+ model-index:
19
+ - name: XLS-R Wav2Vec2 Italian by radiogroup crits
20
+ results:
21
+ - task:
22
+ name: Automatic Speech Recognition
23
+ type: automatic-speech-recognition
24
+ dataset:
25
+ name: Common Voice 8
26
+ type: mozilla-foundation/common_voice_8_0
27
+ args: it
28
+ metrics:
29
+ - name: Test WER
30
+ type: wer
31
+ value: 9.04
32
+ - name: Test CER
33
+ type: cer
34
+ value: 2.2
35
+ - name: Test WER (+LM)
36
+ type: wer
37
+ value: 6.24
38
+ - name: Test CER (+LM)
39
+ type: cer
40
+ value: 1.67
41
+ ---
42
+ # XLS-R-1B-ITALIAN-DOC4LM-5GRAM
43
+
44
+ ## Language model information
45
+
46
+ Our language model was generated using a dataset of Italian wikipedia articles and manual transcriptions about gr and television programs.
47
+
48
+
49
+ ## Download CommonVoice8.0 dataset for italian language
50
+ ```python
51
+ from datasets import load_dataset
52
+
53
+ dataset = load_dataset("mozilla-foundation/common_voice_8_0", "it", use_auth_token=True)
54
+ ```
55
+
56
+ ## Evaluation Commands
57
+
58
+ To evaluate on `mozilla-foundation/common_voice_8_0` with split `test`:
59
+
60
+ ```bash
61
+ python eval.py --model_id radiogroup-crits/wav2vec2-xls-r-1b-italian-doc4lm-5gram --dataset mozilla-foundation/common_voice_8_0 --config it --split test --log_outputs --greedy
62
+ mv log_mozilla-foundation_common_voice_8_0_it_test_predictions.txt log_mozilla-foundation_common_voice_8_0_it_test_predictions_greedy.txt
63
+ mv mozilla-foundation_common_voice_8_0_it_test_eval_results.txt mozilla-foundation_common_voice_8_0_it_test_eval_results_greedy.txt
64
+ ```
65
+
66
+ ## Citation
67
+ If you want to cite this model you can use this:
68
+
69
+ ```bibtex
70
+ @misc{radiogroup-crits2022wav2vec2-xls-r-1b-italian-doc4lm-5gram,
71
+ title={XLS-R Wav2Vec2 Italian by radiogroup-crits},
72
+ author={Raffaele Teraoni Prioletti, Paolo Casagranda and Francesco Russo},
73
+ publisher={Hugging Face},
74
+ journal={Hugging Face Hub},
75
+ howpublished={\url{https://huggingface.co/radiogroup-crits/wav2vec2-xls-r-1b-italian-doc4lm-5gram}},
76
+ year={2022}
77
+ }
78
  ```