transiteration
/

stt_kz_quartznet15x5

Automatic Speech Recognition

Model card Files Files and versions Community

transiteration commited on Sep 6, 2023

Commit

93109b8

•

1 Parent(s): 4689612

Update README.md

Files changed (1) hide show

README.md +47 -4

README.md CHANGED Viewed

@@ -1,6 +1,4 @@
 ---
-datasets:
-- Shirali/ISSAI_KSC_335RS_v_1_1
 language:
 - kk
 metrics:
@@ -12,5 +10,50 @@ tags:
 - speech
 - audio
 - NeMo
-- PyTorch
----

 ---
 language:
 - kk
 metrics:
 - speech
 - audio
 - NeMo
+- pytorch
+---
+## Model Overview
+In order to prepare, adjust, or experiment with the model, it's necessary to install NVIDIA NeMo.
+We advise installing it once you've already installed the most recent version of Pytorch.
+```
+pip install nemo_toolkit['all']
+```
+## Model Usage
+The model is accessible within the NeMo toolkit [1] and can serve as a pre-trained checkpoint for either making inferences or for fine-tuning on a different dataset.
+### How to Import
+```
+import nemo.collections.asr as nemo_asr
+asr_model = nemo_asr.models.EncDecCTCModel.restore_from(restore_path="stt_kz_quartznet15x5.nemo")
+```
+### How to Transcribe Single Audio File
+```
+asr_model.transcribe(['sample_kz.wav'])
+```
+### How to Transcribe Multiple Audio Files
+```
+python3 transcribe_speech.py model_path=stt_kz_quartznet15x5.nemo audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
+```
+If you have a manifest file with your audio files:
+```
+python3 transcribe_speech.py model_path=stt_kz_quartznet15x5.nemo dataset_manifest=manifest.json
+```
+## Input and Output
+This model can take input in the form of mono-channel audio .WAV files with a sample rate of 16,000 KHz.
+Then, this model gives you the spoken words in a text format for a given audio sample.
+## Model Architecture
+QuartzNet [2] is a Jasper-like network that uses separable convolutions and larger filter sizes. It has comparable accuracy to Jasper while having much fewer parameters. This particular model has 15 blocks each repeated 5 times.