Acoustic and language models

Acoustic model built using QuartzNet15x5 architecture and trained using NeMo toolkit

Three n-gram language models created using KenLM Language Model Toolkit

Archives Size Links
QuartzNet15x5_golos.nemo 68 MB https://sc.link/ZMv
KenLMs.tar 4.8 GB https://sc.link/YL0

Golos data and models are also available in the hub of pre-trained models, datasets, and containers - DataHub ML Space. You can train the model and deploy it on the high-performance SberCloud infrastructure in ML Space - full-cycle machine learning development platform for DS-teams collaboration based on the Christofari Supercomputer.

Evaluation

Percents of Word Error Rate for different test sets

Decoder \ Test set Crowd test Farfield test MCV1 dev MCV1 test
Greedy decoder 4.389 % 14.949 % 9.314 % 11.278 %
Beam Search with Common Crawl LM 4.709 % 12.503 % 6.341 % 7.976 %
Beam Search with Golos train set LM 3.548 % 12.384 % - -
Beam Search with Common Crawl and Golos LM 3.318 % 11.488 % 6.4 % 8.06 %

1 Common Voice - Mozilla's initiative to help teach machines how real people speak.

Resources

[arxiv.org] Golos: Russian Dataset for Speech Research

[habr.com] Golos — самый большой русскоязычный речевой датасет, размеченный вручную, теперь в открытом доступе

[habr.com] Как улучшить распознавание русской речи до 3% WER с помощью открытых данных

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Dataset used to train SberDevices/quartznet-russian