transiteration commited on
Commit
93109b8
1 Parent(s): 4689612

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -4
README.md CHANGED
@@ -1,6 +1,4 @@
1
  ---
2
- datasets:
3
- - Shirali/ISSAI_KSC_335RS_v_1_1
4
  language:
5
  - kk
6
  metrics:
@@ -12,5 +10,50 @@ tags:
12
  - speech
13
  - audio
14
  - NeMo
15
- - PyTorch
16
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  language:
3
  - kk
4
  metrics:
 
10
  - speech
11
  - audio
12
  - NeMo
13
+ - pytorch
14
+ ---
15
+
16
+
17
+ ## Model Overview
18
+
19
+ In order to prepare, adjust, or experiment with the model, it's necessary to install NVIDIA NeMo.
20
+ We advise installing it once you've already installed the most recent version of Pytorch.
21
+ ```
22
+ pip install nemo_toolkit['all']
23
+ ```
24
+
25
+ ## Model Usage
26
+
27
+ The model is accessible within the NeMo toolkit [1] and can serve as a pre-trained checkpoint for either making inferences or for fine-tuning on a different dataset.
28
+
29
+ ### How to Import
30
+ ```
31
+ import nemo.collections.asr as nemo_asr
32
+ asr_model = nemo_asr.models.EncDecCTCModel.restore_from(restore_path="stt_kz_quartznet15x5.nemo")
33
+ ```
34
+ ### How to Transcribe Single Audio File
35
+ ```
36
+ asr_model.transcribe(['sample_kz.wav'])
37
+ ```
38
+ ### How to Transcribe Multiple Audio Files
39
+ ```
40
+ python3 transcribe_speech.py model_path=stt_kz_quartznet15x5.nemo audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
41
+ ```
42
+
43
+ If you have a manifest file with your audio files:
44
+ ```
45
+ python3 transcribe_speech.py model_path=stt_kz_quartznet15x5.nemo dataset_manifest=manifest.json
46
+ ```
47
+
48
+ ## Input and Output
49
+
50
+ This model can take input in the form of mono-channel audio .WAV files with a sample rate of 16,000 KHz.
51
+ Then, this model gives you the spoken words in a text format for a given audio sample.
52
+
53
+ ## Model Architecture
54
+
55
+ QuartzNet [2] is a Jasper-like network that uses separable convolutions and larger filter sizes. It has comparable accuracy to Jasper while having much fewer parameters. This particular model has 15 blocks each repeated 5 times.
56
+
57
+
58
+
59
+