transiteration
commited on
Commit
•
5e085ba
1
Parent(s):
63f1a81
Update README.md
Browse files
README.md
CHANGED
@@ -16,8 +16,8 @@ tags:
|
|
16 |
|
17 |
## Model Overview
|
18 |
|
19 |
-
In order to prepare, adjust, or experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1]
|
20 |
-
We advise installing it once you've installed the most recent version of PyTorch
|
21 |
This model is trained on NVIDIA GeForce RTX 2070:\
|
22 |
Python 3.7.15\
|
23 |
NumPy 1.21.6\
|
@@ -53,7 +53,7 @@ python3 transcribe_speech.py model_path=stt_kz_quartznet15x5.nemo dataset_manife
|
|
53 |
|
54 |
## Input and Output
|
55 |
|
56 |
-
This model can take input from mono-channel audio .WAV files with a sample rate of 16,000 KHz.\
|
57 |
Then, this model gives you the spoken words in a text format for a given audio sample.
|
58 |
|
59 |
## Model Architecture
|
@@ -62,8 +62,8 @@ Then, this model gives you the spoken words in a text format for a given audio s
|
|
62 |
|
63 |
## Training and Dataset
|
64 |
|
65 |
-
The model was finetuned to Kazakh speech based on the pre-trained English Model for over several epochs
|
66 |
-
[Kazakh Speech Corpus 2](https://issai.nu.edu.kz/kz-speech-corpus/?version=1.1) (KSC2) [3] is the first industrial-scale open-source Kazakh speech corpus
|
67 |
In total, KSC2 contains around 1.2k hours of high-quality transcribed data comprising over 600k utterances.
|
68 |
|
69 |
## Performance
|
@@ -72,7 +72,7 @@ Average WER: 15.53%
|
|
72 |
|
73 |
## Limitations
|
74 |
|
75 |
-
Because the GPU has limited power, we used a lightweight model architecture for fine-tuning.\
|
76 |
In general, this makes it faster for inference but might show less overall performance.\
|
77 |
In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.
|
78 |
|
|
|
16 |
|
17 |
## Model Overview
|
18 |
|
19 |
+
In order to prepare, adjust, or experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1].\
|
20 |
+
We advise installing it once you've installed the most recent version of PyTorch.\
|
21 |
This model is trained on NVIDIA GeForce RTX 2070:\
|
22 |
Python 3.7.15\
|
23 |
NumPy 1.21.6\
|
|
|
53 |
|
54 |
## Input and Output
|
55 |
|
56 |
+
This model can take input from mono-channel audio .WAV files with a sample rate of 16,000 KHz.\
|
57 |
Then, this model gives you the spoken words in a text format for a given audio sample.
|
58 |
|
59 |
## Model Architecture
|
|
|
62 |
|
63 |
## Training and Dataset
|
64 |
|
65 |
+
The model was finetuned to Kazakh speech based on the pre-trained English Model for over several epochs.\
|
66 |
+
[Kazakh Speech Corpus 2](https://issai.nu.edu.kz/kz-speech-corpus/?version=1.1) (KSC2) [3] is the first industrial-scale open-source Kazakh speech corpus.\
|
67 |
In total, KSC2 contains around 1.2k hours of high-quality transcribed data comprising over 600k utterances.
|
68 |
|
69 |
## Performance
|
|
|
72 |
|
73 |
## Limitations
|
74 |
|
75 |
+
Because the GPU has limited power, we used a lightweight model architecture for fine-tuning.\
|
76 |
In general, this makes it faster for inference but might show less overall performance.\
|
77 |
In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.
|
78 |
|