transiteration commited on
Commit
5e085ba
1 Parent(s): 63f1a81

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -16,8 +16,8 @@ tags:
16
 
17
  ## Model Overview
18
 
19
- In order to prepare, adjust, or experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1].
20
- We advise installing it once you've installed the most recent version of PyTorch.
21
  This model is trained on NVIDIA GeForce RTX 2070:\
22
  Python 3.7.15\
23
  NumPy 1.21.6\
@@ -53,7 +53,7 @@ python3 transcribe_speech.py model_path=stt_kz_quartznet15x5.nemo dataset_manife
53
 
54
  ## Input and Output
55
 
56
- This model can take input from mono-channel audio .WAV files with a sample rate of 16,000 KHz.\
57
  Then, this model gives you the spoken words in a text format for a given audio sample.
58
 
59
  ## Model Architecture
@@ -62,8 +62,8 @@ Then, this model gives you the spoken words in a text format for a given audio s
62
 
63
  ## Training and Dataset
64
 
65
- The model was finetuned to Kazakh speech based on the pre-trained English Model for over several epochs.
66
- [Kazakh Speech Corpus 2](https://issai.nu.edu.kz/kz-speech-corpus/?version=1.1) (KSC2) [3] is the first industrial-scale open-source Kazakh speech corpus.
67
  In total, KSC2 contains around 1.2k hours of high-quality transcribed data comprising over 600k utterances.
68
 
69
  ## Performance
@@ -72,7 +72,7 @@ Average WER: 15.53%
72
 
73
  ## Limitations
74
 
75
- Because the GPU has limited power, we used a lightweight model architecture for fine-tuning.\
76
  In general, this makes it faster for inference but might show less overall performance.\
77
  In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.
78
 
 
16
 
17
  ## Model Overview
18
 
19
+ In order to prepare, adjust, or experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1].\
20
+ We advise installing it once you've installed the most recent version of PyTorch.\
21
  This model is trained on NVIDIA GeForce RTX 2070:\
22
  Python 3.7.15\
23
  NumPy 1.21.6\
 
53
 
54
  ## Input and Output
55
 
56
+ This model can take input from mono-channel audio .WAV files with a sample rate of 16,000 KHz.\
57
  Then, this model gives you the spoken words in a text format for a given audio sample.
58
 
59
  ## Model Architecture
 
62
 
63
  ## Training and Dataset
64
 
65
+ The model was finetuned to Kazakh speech based on the pre-trained English Model for over several epochs.\
66
+ [Kazakh Speech Corpus 2](https://issai.nu.edu.kz/kz-speech-corpus/?version=1.1) (KSC2) [3] is the first industrial-scale open-source Kazakh speech corpus.\
67
  In total, KSC2 contains around 1.2k hours of high-quality transcribed data comprising over 600k utterances.
68
 
69
  ## Performance
 
72
 
73
  ## Limitations
74
 
75
+ Because the GPU has limited power, we used a lightweight model architecture for fine-tuning.\
76
  In general, this makes it faster for inference but might show less overall performance.\
77
  In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.
78