kaushal98b commited on
Commit
22fed52
1 Parent(s): baba7fe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -37
README.md CHANGED
@@ -7,7 +7,21 @@ library_name: nemo
7
  ---
8
  ## IndicConformer
9
 
10
- IndicConformer is a Hybrid RNNT conformer model built for Hindi.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  ## AI4Bharat NeMo:
13
 
@@ -17,47 +31,30 @@ library_name: nemo
17
  ```
18
 
19
  ## Usage
20
-
21
- ```bash
22
- $ python inference.py --help
23
- usage: inference.py [-h] -c CHECKPOINT -f AUDIO_FILEPATH -d (cpu,cuda) -l LANGUAGE_CODE
24
- options:
25
- -h, --help show this help message and exit
26
- -c CHECKPOINT, --checkpoint CHECKPOINT
27
- Path to .nemo file
28
- -f AUDIO_FILEPATH, --audio_filepath AUDIO_FILEPATH
29
- Audio filepath
30
- -d (cpu,cuda), --device (cpu,cuda)
31
- Device (cpu/gpu)
32
- -l LANGUAGE_CODE, --language_code LANGUAGE_CODE
33
- Language Code (eg. hi)
34
  ```
 
35
 
36
- ## Example command
 
 
37
  ```
38
- python inference.py -c indicconformer_stt_hi_hybrid_rnnt_large.nemo -f hindi-16khz.wav -d cuda -l hi
 
 
39
  ```
40
- Expected output -
41
 
 
 
42
  ```
43
- Loading model..
44
- ...
45
- Transcibing..
46
- ----------
47
- Transcript:
48
- Took ** seconds.
49
- ----------
50
  ```
51
 
52
- ### Input
53
-
54
- This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
55
-
56
- ### Output
57
-
58
- This model provides transcribed speech as a string for a given audio sample.
59
-
60
- ## Model Architecture
61
-
62
- This model is a conformer-Large model, consisting of 120M parameters, as the encoder, with a hybrid CTC-RNNT decoder. The model has 17 conformer blocks with
63
- 512 as the model dimension.
 
7
  ---
8
  ## IndicConformer
9
 
10
+ IndicConformer is a Hybrid CTC-RNNT conformer ASR(Automatic Speech Recognition) model built for Hindi.
11
+
12
+ ### Input
13
+
14
+ This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
15
+
16
+ ### Output
17
+
18
+ This model provides transcribed speech as a string for a given audio sample.
19
+
20
+ ## Model Architecture
21
+
22
+ This model is a conformer-Large model, consisting of 120M parameters, as the encoder, with a hybrid CTC-RNNT decoder. The model has 17 conformer blocks with
23
+ 512 as the model dimension.
24
+
25
 
26
  ## AI4Bharat NeMo:
27
 
 
31
  ```
32
 
33
  ## Usage
34
+ Download and load the model from Huggingface.
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ```
36
+ model = nemo_asr.models.ASRModel.from_pretrained("ai4bharat/indicconformer_stt_hi_hybrid_rnnt_large")
37
 
38
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
39
+ model.freeze() # inference mode
40
+ model = model.to(device) # transfer model to device
41
  ```
42
+ Get an audio file ready by running the command shown below in your terminal. This will convert the audio to 16000 Hz and monochannel.
43
+ ```
44
+ ffmpeg -i sample_audio.wav -ac 1 -ar 16000 sample_audio_infer_ready.wav
45
  ```
 
46
 
47
+
48
+ ### Inference using CTC decoder
49
  ```
50
+ model.cur_decoder = "ctc"
51
+ ctc_text = model.transcribe(['sample_audio_infer_ready.wav'], batch_size=1,logprobs=False, language_id='hi')[0]
52
+ print(ctc_text)
 
 
 
 
53
  ```
54
 
55
+ ### Inference using RNNT decoder
56
+ ```
57
+ model.cur_decoder = "rnnt"
58
+ rnnt_text = model.transcribe(['sample_audio_infer_ready.wav'], batch_size=1, language_id='hi')[0]
59
+ print(rnnt_text)
60
+ ```