transiteration
/

stt_kz_quartznet15x5

Automatic Speech Recognition

Model card Files Files and versions Community

transiteration commited on Sep 6, 2023

Commit

5e085ba

•

1 Parent(s): 63f1a81

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -16,8 +16,8 @@ tags:
 ## Model Overview
-In order to prepare, adjust, or experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1].
-We advise installing it once you've installed the most recent version of PyTorch.
 This model is trained on NVIDIA GeForce RTX 2070:\
 Python 3.7.15\
 NumPy 1.21.6\
@@ -53,7 +53,7 @@ python3 transcribe_speech.py model_path=stt_kz_quartznet15x5.nemo dataset_manife
 ## Input and Output
-This model can take input from mono-channel audio .WAV files with a sample rate of 16,000 KHz.\
 Then, this model gives you the spoken words in a text format for a given audio sample.
 ## Model Architecture
@@ -62,8 +62,8 @@ Then, this model gives you the spoken words in a text format for a given audio s
 ## Training and Dataset
-The model was finetuned to Kazakh speech based on the pre-trained English Model for over several epochs.
-[Kazakh Speech Corpus 2](https://issai.nu.edu.kz/kz-speech-corpus/?version=1.1) (KSC2) [3] is the first industrial-scale open-source Kazakh speech corpus.
 In total, KSC2 contains around 1.2k hours of high-quality transcribed data comprising over 600k utterances.
 ## Performance
@@ -72,7 +72,7 @@ Average WER: 15.53%
 ## Limitations
-Because the GPU has limited power, we used a lightweight model architecture for fine-tuning.\
 In general, this makes it faster for inference but might show less overall performance.\
 In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.

 ## Model Overview
+In order to prepare, adjust, or experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1].\
+We advise installing it once you've installed the most recent version of PyTorch.\
 This model is trained on NVIDIA GeForce RTX 2070:\
 Python 3.7.15\
 NumPy 1.21.6\
 ## Input and Output
+This model can take input from mono-channel audio .WAV files with a sample rate of 16,000 KHz.\
 Then, this model gives you the spoken words in a text format for a given audio sample.
 ## Model Architecture
 ## Training and Dataset
+The model was finetuned to Kazakh speech based on the pre-trained English Model for over several epochs.\
+[Kazakh Speech Corpus 2](https://issai.nu.edu.kz/kz-speech-corpus/?version=1.1) (KSC2) [3] is the first industrial-scale open-source Kazakh speech corpus.\
 In total, KSC2 contains around 1.2k hours of high-quality transcribed data comprising over 600k utterances.
 ## Performance
 ## Limitations
+Because the GPU has limited power, we used a lightweight model architecture for fine-tuning.\
 In general, this makes it faster for inference but might show less overall performance.\
 In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.