transiteration
/

stt_kz_quartznet15x5

Automatic Speech Recognition

Model card Files Files and versions Community

transiteration commited on Jan 19

Commit

0cce9b6

•

1 Parent(s): 00c528b

Update README.md

Files changed (1) hide show

README.md +4 -5

README.md CHANGED Viewed

@@ -17,7 +17,6 @@ tags:
 ## Model Overview
 In order to prepare and experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1].\
-We advise installing it once you've installed the most recent version of PyTorch.\
 \
 This model have been trained on NVIDIA GeForce RTX 2070:\
 Python 3.7.15\
@@ -26,7 +25,7 @@ PyTorch 1.21.1\
 NVIDIA NeMo 1.7.0
 ```
-pip install nemo_toolkit['all']
 ```
 ## Model Usage:
@@ -54,11 +53,11 @@ python3 evaluate.py --model_path /path/to/stt_kz_quartznet15x5.nemo --test_manif
 #### How to Transcribe Audio File
-We can get a sample audio to test the model:
 ```
 wget https://asr-kz-example.s3.us-west-2.amazonaws.com/sample_kz.wav
 ```
-Then this line of code is to transcribe the single audio:
 ```
 python3 transcibe.py --model_path /path/to/stt_kz_quartznet15x5.nemo --audio_file_path path/to/audio/file
 ```
@@ -85,7 +84,7 @@ through the applying of **Greedy Decoding**.
 ## Limitations
-Because the GPU has limited power, we used a lightweight model architecture for fine-tuning.\
 In general, this makes it faster for inference but might show less overall performance.\
 In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.

 ## Model Overview
 In order to prepare and experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1].\
 \
 This model have been trained on NVIDIA GeForce RTX 2070:\
 Python 3.7.15\
 NVIDIA NeMo 1.7.0
 ```
+pip3 install nemo_toolkit['all']
 ```
 ## Model Usage:
 #### How to Transcribe Audio File
+Sample audio to test the model:
 ```
 wget https://asr-kz-example.s3.us-west-2.amazonaws.com/sample_kz.wav
 ```
+This line is to transcribe the single audio:
 ```
 python3 transcibe.py --model_path /path/to/stt_kz_quartznet15x5.nemo --audio_file_path path/to/audio/file
 ```
 ## Limitations
+Because the GPU has limited power, lightweight model architecture was used for fine-tuning.\
 In general, this makes it faster for inference but might show less overall performance.\
 In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.