Add links to SDP configs
#5
by
igitman
- opened
README.md
CHANGED
@@ -191,9 +191,9 @@ The tokenizers for these models were built using the text transcripts of the tra
|
|
191 |
|
192 |
The model in this collection are trained on a composite dataset (NeMo PnC IT ASRSET) comprising of 487 hours of Italian speech:
|
193 |
|
194 |
-
- Mozilla Common Voice 12.0 (Italian) - 220 hours after data cleaning
|
195 |
-
- Multilingual LibriSpeech (Italian) - 214 hours after data cleaning
|
196 |
-
- VoxPopuli transcribed subset (Italian) - 53 hours after data cleaning
|
197 |
|
198 |
## Performance
|
199 |
|
|
|
191 |
|
192 |
The model in this collection are trained on a composite dataset (NeMo PnC IT ASRSET) comprising of 487 hours of Italian speech:
|
193 |
|
194 |
+
- Mozilla Common Voice 12.0 (Italian) - 220 hours after data cleaning. [Speech Data Processor](https://github.com/NVIDIA/NeMo-speech-data-processor) config used to prepare this data is [here](https://github.com/NVIDIA/NeMo-speech-data-processor/blob/main/dataset_configs/italian/mcv/config.yaml).
|
195 |
+
- Multilingual LibriSpeech (Italian) - 214 hours after data cleaning. [Speech Data Processor](https://github.com/NVIDIA/NeMo-speech-data-processor) config used to prepare this data is [here](https://github.com/NVIDIA/NeMo-speech-data-processor/blob/main/dataset_configs/italian/mls/config.yaml).
|
196 |
+
- VoxPopuli transcribed subset (Italian) - 53 hours after data cleaning. [Speech Data Processor](https://github.com/NVIDIA/NeMo-speech-data-processor) config used to prepare this data is [here](https://github.com/NVIDIA/NeMo-speech-data-processor/blob/main/dataset_configs/italian/voxpopuli/config.yaml).
|
197 |
|
198 |
## Performance
|
199 |
|