---
language:
- kk
metrics:
- wer
library_name: nemo
pipeline_tag: automatic-speech-recognition
tags:
- automatic-speech-recognition
- speech
- audio
- NeMo
- pytorch
---


## Model Overview

In order to prepare, adjust, or experiment with the model, it's necessary to install NVIDIA NeMo.
We advise installing it once you've already installed the most recent version of Pytorch.
```
pip install nemo_toolkit['all']
```

## Model Usage

The model is accessible within the NeMo toolkit [1] and can serve as a pre-trained checkpoint for either making inferences or for fine-tuning on a different dataset.

### How to Import
```
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.EncDecCTCModel.restore_from(restore_path="stt_kz_quartznet15x5.nemo")
```
### How to Transcribe Single Audio File
```
asr_model.transcribe(['sample_kz.wav'])
```
### How to Transcribe Multiple Audio Files
```
python3 transcribe_speech.py model_path=stt_kz_quartznet15x5.nemo audio_dir="<DIRECTORY CONTAINING AUDIO FILES>" 
```

If you have a manifest file with your audio files:
```
python3 transcribe_speech.py model_path=stt_kz_quartznet15x5.nemo dataset_manifest=manifest.json
```

## Input and Output

This model can take input in the form of mono-channel audio .WAV files with a sample rate of 16,000 KHz.
Then, this model gives you the spoken words in a text format for a given audio sample.

## Model Architecture

QuartzNet [2] is a Jasper-like network that uses separable convolutions and larger filter sizes. It has comparable accuracy to Jasper while having much fewer parameters. This particular model has 15 blocks each repeated 5 times.