poonehmousavi commited on
Commit
0361691
·
1 Parent(s): bb52bb9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -16
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  language:
3
- - en
4
  thumbnail: null
5
  pipeline_tag: automatic-speech-recognition
6
  tags:
@@ -15,31 +15,31 @@ metrics:
15
  - wer
16
  - cer
17
  model-index:
18
- - name: asr-wav2vec2-commonvoice-14-en
19
  results:
20
  - task:
21
  name: Automatic Speech Recognition
22
  type: automatic-speech-recognition
23
  dataset:
24
- name: CommonVoice Corpus 14.0 (English)
25
  type: mozilla-foundation/common_voice_14.0
26
- config: en
27
  split: test
28
  args:
29
- language: en
30
  metrics:
31
  - name: Test WER
32
  type: wer
33
- value: '16.68'
34
  ---
35
 
36
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
37
  <br/><br/>
38
 
39
- # wav2vec 2.0 with CTC trained on CommonVoice English (No LM)
40
 
41
  This repository provides all the necessary tools to perform automatic speech
42
- recognition from an end-to-end system pretrained on CommonVoice (English Language) within
43
  SpeechBrain. For a better experience, we encourage you to learn more about
44
  [SpeechBrain](https://speechbrain.github.io).
45
 
@@ -47,14 +47,14 @@ The performance of the model is the following:
47
 
48
  | Release | Test CER | Test WER | GPUs |
49
  |:-------------:|:--------------:|:--------------:| :--------:|
50
- | 15-08-23 | 7.92 | 16.86 | 1xV100 32GB |
51
 
52
  ## Pipeline description
53
 
54
  This ASR system is composed of 2 different but linked blocks:
55
  - Tokenizer (unigram) that transforms words into unigrams and trained with
56
- the train transcriptions (train.tsv) of CommonVoice (en).
57
- - Acoustic model (wav2vec2.0 + CTC). A pretrained wav2vec 2.0 model ([wav2vec2-large-lv60](https://huggingface.co/facebook/wav2vec2-large-lv60)) is combined with two DNN layers and finetuned on CommonVoice DE.
58
  The obtained final acoustic representation is given to the CTC decoder.
59
 
60
  The system is trained with recordings sampled at 16kHz (single channel).
@@ -71,13 +71,13 @@ pip install speechbrain transformers
71
  Please notice that we encourage you to read our tutorials and learn more about
72
  [SpeechBrain](https://speechbrain.github.io).
73
 
74
- ### Transcribing your own audio files (in English)
75
 
76
  ```python
77
  from speechbrain.pretrained import EncoderASR
78
 
79
- asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-commonvoice-14-en", savedir="pretrained_models/asr-wav2vec2-commonvoice-14-en")
80
- asr_model.transcribe_file("speechbrain/asr-wav2vec2-commonvoice-14-en/example-en.wav")
81
 
82
  ```
83
  ### Inference on GPU
@@ -103,10 +103,10 @@ pip install -e .
103
  3. Run Training:
104
  ```bash
105
  cd recipes/CommonVoice/ASR/CTC/
106
- python train_with_wav2vec.py hparams/train_en_with_wav2vec.yaml --data_folder=your_data_folder
107
  ```
108
 
109
- You can find our training results (models, logs, etc) [here](https://www.dropbox.com/sh/ch10cnbhf1faz3w/AACdHFG65LC6582H0Tet_glTa?dl=0).
110
 
111
  ### Limitations
112
  The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
 
1
  ---
2
  language:
3
+ - it
4
  thumbnail: null
5
  pipeline_tag: automatic-speech-recognition
6
  tags:
 
15
  - wer
16
  - cer
17
  model-index:
18
+ - name: asr-wav2vec2-commonvoice-14-it
19
  results:
20
  - task:
21
  name: Automatic Speech Recognition
22
  type: automatic-speech-recognition
23
  dataset:
24
+ name: CommonVoice Corpus 14.0 (Italian)
25
  type: mozilla-foundation/common_voice_14.0
26
+ config: it
27
  split: test
28
  args:
29
+ language: it
30
  metrics:
31
  - name: Test WER
32
  type: wer
33
+ value: '8.28'
34
  ---
35
 
36
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
37
  <br/><br/>
38
 
39
+ # wav2vec 2.0 with CTC trained on CommonVoice Italian (No LM)
40
 
41
  This repository provides all the necessary tools to perform automatic speech
42
+ recognition from an end-to-end system pretrained on CommonVoice (Italian Language) within
43
  SpeechBrain. For a better experience, we encourage you to learn more about
44
  [SpeechBrain](https://speechbrain.github.io).
45
 
 
47
 
48
  | Release | Test CER | Test WER | GPUs |
49
  |:-------------:|:--------------:|:--------------:| :--------:|
50
+ | 15-08-23 | 2.38 | 8.38 | 1xV100 32GB |
51
 
52
  ## Pipeline description
53
 
54
  This ASR system is composed of 2 different but linked blocks:
55
  - Tokenizer (unigram) that transforms words into unigrams and trained with
56
+ the train transcriptions (train.tsv) of CommonVoice (it).
57
+ - Acoustic model (wav2vec2.0 + CTC). A pretrained wav2vec 2.0 model ([wav2vec2-large-it-voxpopuli](https://huggingface.co/facebook/wav2vec2-large-it-voxpopuli)) is combined with two DNN layers and finetuned on CommonVoice DE.
58
  The obtained final acoustic representation is given to the CTC decoder.
59
 
60
  The system is trained with recordings sampled at 16kHz (single channel).
 
71
  Please notice that we encourage you to read our tutorials and learn more about
72
  [SpeechBrain](https://speechbrain.github.io).
73
 
74
+ ### Transcribing your own audio files (in Italian)
75
 
76
  ```python
77
  from speechbrain.pretrained import EncoderASR
78
 
79
+ asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-commonvoice-14-it", savedir="pretrained_models/asr-wav2vec2-commonvoice-14-it")
80
+ asr_model.transcribe_file("speechbrain/asr-wav2vec2-commonvoice-14-it/example-it.wav")
81
 
82
  ```
83
  ### Inference on GPU
 
103
  3. Run Training:
104
  ```bash
105
  cd recipes/CommonVoice/ASR/CTC/
106
+ python train_with_wav2vec.py hparams/train_it_with_wav2vec.yaml --data_folder=your_data_folder
107
  ```
108
 
109
+ You can find our training results (models, logs, etc) [here](https://www.dropbox.com/sh/hthxqzh5boq15rn/AACftSab_FM6EFWWPgHpKw82a?dl=0).
110
 
111
  ### Limitations
112
  The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.