Automatic Speech Recognition
Ukrainian
Eval Results

Flashlight for Ukrainian

Community

See other Ukrainian models: https://github.com/egorsmkv/speech-recognition-uk

Overview

This repository contains the acoustic model for Ukrainian trained on Flashlight framework: https://github.com/flashlight/flashlight/tree/main/flashlight/app/asr

  • Architecture: Conformer (300m params)
  • Data in train: Common Voice 10 & Voice of America
  • Trained epochs: 410
  • Train time: around a week (RTX A4000)

Quality

  • WER: 9.0777% (id est the quality is 90.92%)
  • CER: 1.9839%

How to test?

Run a container with Flashlight running with CPU

docker-compose up

# and in another termianl
docker exec -it flashlight_cpu bash

Run

Just with an AM:

/root/flashlight/build/bin/asr/fl_asr_test --am /models/uk_am.bin --datadir ''  --emission_dir '' --uselexicon false \
 --test /data/rows.lst --tokens /models/tokens.txt --lexicon /models/lexicon.txt --show

With an LM:

/root/flashlight/build/bin/asr/fl_asr_decode \
 --am=/models/uk_am.bin \
 --test=/data/labels_absolute.lst \
 --maxload=3477 \
 --nthread_decoder=2 \
 --show \
 --showletters \
 --lexicon=/models/lexicon.txt \
 --uselexicon=false \
 --lm=/models/lm_4gram_500k.binary \
 --lmtype=kenlm \
 --decodertype=wrd \
 --beamsize=200 \
 --beamsizetoken=200 \
 --beamthreshold=20 \
 --lmweight=0.75 \
 --wordscore=0 \
 --eosscore=0 \
 --silscore=0 \
 --unkscore=0 \
 --smearing=max

How to fine-tune on own data?

/root/flashlight/build/bin/asr/fl_asr_train continue /models/ --flagsfile /models/train.flags

/models/ must contain .bin files

Cite this work

@misc {smoliakov_2025,
    author       = { {Smoliakov} },
    title        = { flashlight-uk (Revision 1ac154b) },
    year         = 2025,
    url          = { https://huggingface.co/Yehor/flashlight-uk },
    doi          = { 10.57967/hf/4577 },
    publisher    = { Hugging Face }
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Datasets used to train Yehor/flashlight-uk

Evaluation results