transiteration
commited on
Commit
•
0cce9b6
1
Parent(s):
00c528b
Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,6 @@ tags:
|
|
17 |
## Model Overview
|
18 |
|
19 |
In order to prepare and experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1].\
|
20 |
-
We advise installing it once you've installed the most recent version of PyTorch.\
|
21 |
\
|
22 |
This model have been trained on NVIDIA GeForce RTX 2070:\
|
23 |
Python 3.7.15\
|
@@ -26,7 +25,7 @@ PyTorch 1.21.1\
|
|
26 |
NVIDIA NeMo 1.7.0
|
27 |
|
28 |
```
|
29 |
-
|
30 |
```
|
31 |
|
32 |
## Model Usage:
|
@@ -54,11 +53,11 @@ python3 evaluate.py --model_path /path/to/stt_kz_quartznet15x5.nemo --test_manif
|
|
54 |
|
55 |
#### How to Transcribe Audio File
|
56 |
|
57 |
-
|
58 |
```
|
59 |
wget https://asr-kz-example.s3.us-west-2.amazonaws.com/sample_kz.wav
|
60 |
```
|
61 |
-
|
62 |
```
|
63 |
python3 transcibe.py --model_path /path/to/stt_kz_quartznet15x5.nemo --audio_file_path path/to/audio/file
|
64 |
```
|
@@ -85,7 +84,7 @@ through the applying of **Greedy Decoding**.
|
|
85 |
|
86 |
## Limitations
|
87 |
|
88 |
-
Because the GPU has limited power,
|
89 |
In general, this makes it faster for inference but might show less overall performance.\
|
90 |
In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.
|
91 |
|
|
|
17 |
## Model Overview
|
18 |
|
19 |
In order to prepare and experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1].\
|
|
|
20 |
\
|
21 |
This model have been trained on NVIDIA GeForce RTX 2070:\
|
22 |
Python 3.7.15\
|
|
|
25 |
NVIDIA NeMo 1.7.0
|
26 |
|
27 |
```
|
28 |
+
pip3 install nemo_toolkit['all']
|
29 |
```
|
30 |
|
31 |
## Model Usage:
|
|
|
53 |
|
54 |
#### How to Transcribe Audio File
|
55 |
|
56 |
+
Sample audio to test the model:
|
57 |
```
|
58 |
wget https://asr-kz-example.s3.us-west-2.amazonaws.com/sample_kz.wav
|
59 |
```
|
60 |
+
This line is to transcribe the single audio:
|
61 |
```
|
62 |
python3 transcibe.py --model_path /path/to/stt_kz_quartznet15x5.nemo --audio_file_path path/to/audio/file
|
63 |
```
|
|
|
84 |
|
85 |
## Limitations
|
86 |
|
87 |
+
Because the GPU has limited power, lightweight model architecture was used for fine-tuning.\
|
88 |
In general, this makes it faster for inference but might show less overall performance.\
|
89 |
In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.
|
90 |
|