metadata
license: mit
language:
- mr
pipeline_tag: automatic-speech-recognition
library_name: nemo
IndicConformer
IndicConformer is a Hybrid RNNT conformer model built for Marathi.
AI4Bharat NeMo:
To load, train, fine-tune or play with the model you will need to install AI4Bharat NeMo. We recommend you install it using the command shown below
git clone https://github.com/AI4Bharat/NeMo.git && cd NeMo && git checkout nemo-v2 && bash reinstall.sh
Usage
$ python inference.py --help
usage: inference.py [-h] -c CHECKPOINT -f AUDIO_FILEPATH -d (cpu,cuda) -l LANGUAGE_CODE
options:
-h, --help show this help message and exit
-c CHECKPOINT, --checkpoint CHECKPOINT
Path to .nemo file
-f AUDIO_FILEPATH, --audio_filepath AUDIO_FILEPATH
Audio filepath
-d (cpu,cuda), --device (cpu,cuda)
Device (cpu/gpu)
-l LANGUAGE_CODE, --language_code LANGUAGE_CODE
Language Code (eg. hi)
Example command
python inference.py -c indicconformer_stt_mr_hybrid_rnnt_large.nemo -f hindi-16khz.wav -d cuda -l hi
Expected output -
Loading model..
...
Transcibing..
----------
Transcript:
Took ** seconds.
----------
Input
This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
Output
This model provides transcribed speech as a string for a given audio sample.
Model Architecture
This model is a conformer-Large model, consisting of 120M parameters, as the encoder, with a hybrid CTC-RNNT decoder. The model has 17 conformer blocks with 512 as the model dimension.