Acoustic and language models

Acoustic model built using QuartzNet15x5 architecture and trained using NeMo toolkit

Three n-gram language models created using KenLM Language Model Toolkit

Archives Size Links
QuartzNet15x5_golos.nemo 68 MB https://sc.link/ZMv
KenLMs.tar 4.8 GB https://sc.link/YL0

Golos data and models are also available in the hub of pre-trained models, datasets, and containers - DataHub ML Space. You can train the model and deploy it on the high-performance SberCloud infrastructure in ML Space - full-cycle machine learning development platform for DS-teams collaboration based on the Christofari Supercomputer.

Evaluation

Percents of Word Error Rate for different test sets

Decoder \ Test set Crowd test Farfield test MCV1 dev MCV1 test
Greedy decoder 4.389 % 14.949 % 9.314 % 11.278 %
Beam Search with Common Crawl LM 4.709 % 12.503 % 6.341 % 7.976 %
Beam Search with Golos train set LM 3.548 % 12.384 % - -
Beam Search with Common Crawl and Golos LM 3.318 % 11.488 % 6.4 % 8.06 %

1 Common Voice - Mozilla's initiative to help teach machines how real people speak.

Resources

[arxiv.org] Golos: Russian Dataset for Speech Research

[habr.com] Golos — самый большой русскоязычный речевой датасет, размеченный вручную, теперь в открытом доступе

[habr.com] Как улучшить распознавание русской речи до 3% WER с помощью открытых данных

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train SberDevices/quartznet-russian