Acoustic and language models

Acoustic model built using QuartzNet15x5 architecture and trained using NeMo toolkit

Three n-gram language models created using KenLM Language Model Toolkit

LM built on Common Crawl Russian dataset
LM built on Golos train set
LM built on Common Crawl and Golos datasets together (50/50)

Archives	Size	Links
QuartzNet15x5_golos.nemo	68 MB	https://sc.link/ZMv
KenLMs.tar	4.8 GB	https://sc.link/YL0

Golos data and models are also available in the hub of pre-trained models, datasets, and containers - DataHub ML Space. You can train the model and deploy it on the high-performance SberCloud infrastructure in ML Space - full-cycle machine learning development platform for DS-teams collaboration based on the Christofari Supercomputer.

Evaluation

Percents of Word Error Rate for different test sets

Decoder \ Test set	Crowd test	Farfield test	MCV¹ dev	MCV¹ test
Greedy decoder	4.389 %	14.949 %	9.314 %	11.278 %
Beam Search with Common Crawl LM	4.709 %	12.503 %	6.341 %	7.976 %
Beam Search with Golos train set LM	3.548 %	12.384 %	-	-
Beam Search with Common Crawl and Golos LM	3.318 %	11.488 %	6.4 %	8.06 %

¹ Common Voice - Mozilla's initiative to help teach machines how real people speak.

Resources

[arxiv.org] Golos: Russian Dataset for Speech Research

[habr.com] Golos — самый большой русскоязычный речевой датасет, размеченный вручную, теперь в открытом доступе

[habr.com] Как улучшить распознавание русской речи до 3% WER с помощью открытых данных

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train SberDevices/quartznet-russian

Papers for SberDevices/quartznet-russian

Golos: Russian Dataset for Speech Research

Paper • 2106.10161 • Published Jun 18, 2021 • 1

QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions

Paper • 1910.10261 • Published Oct 22, 2019