File size: 1,978 Bytes
ef355c0 cf9141e ef355c0 bc936e0 ef355c0 b08eae4 22fed52 ef355c0 bc936e0 ef355c0 bc936e0 ef355c0 bc936e0 22fed52 bc936e0 deada84 22fed52 ef355c0 22fed52 bc936e0 22fed52 bc936e0 ef355c0 22fed52 bc936e0 22fed52 bc936e0 ef355c0 22fed52 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
---
license: mit
language:
- hi
pipeline_tag: automatic-speech-recognition
library_name: nemo
---
## IndicConformer
IndicConformer is a Hybrid CTC-RNNT conformer ASR(Automatic Speech Recognition) model.
### Language
Hindi
### Input
This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
### Output
This model provides transcribed speech as a string for a given audio sample.
## Model Architecture
This model is a conformer-Large model, consisting of 120M parameters, as the encoder, with a hybrid CTC-RNNT decoder. The model has 17 conformer blocks with
512 as the model dimension.
## AI4Bharat NeMo:
To load, train, fine-tune or play with the model you will need to install [AI4Bharat NeMo](https://github.com/AI4Bharat/NeMo). We recommend you install it using the command shown below
```
git clone https://github.com/AI4Bharat/NeMo.git && cd NeMo && git checkout nemo-v2 && bash reinstall.sh
```
## Usage
Download and load the model from Huggingface.
```
import torch
import nemo.collections.asr as nemo_asr
model = nemo_asr.models.ASRModel.from_pretrained("ai4bharat/indicconformer_stt_hi_hybrid_rnnt_large")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.freeze() # inference mode
model = model.to(device) # transfer model to device
```
Get an audio file ready by running the command shown below in your terminal. This will convert the audio to 16000 Hz and monochannel.
```
ffmpeg -i sample_audio.wav -ac 1 -ar 16000 sample_audio_infer_ready.wav
```
### Inference using CTC decoder
```
model.cur_decoder = "ctc"
ctc_text = model.transcribe(['sample_audio_infer_ready.wav'], batch_size=1,logprobs=False, language_id='hi')[0]
print(ctc_text)
```
### Inference using RNNT decoder
```
model.cur_decoder = "rnnt"
rnnt_text = model.transcribe(['sample_audio_infer_ready.wav'], batch_size=1, language_id='hi')[0]
print(rnnt_text)
```
|