README.md · ai4bharat/indicconformer_stt_mr_hybrid_ctc_rnnt_large at 7ec1c4b52a050751aee6852fb1ed000924569a4f

metadata

license: mit
language:
  - mr
pipeline_tag: automatic-speech-recognition
library_name: nemo

IndicConformer

IndicConformer is a Hybrid RNNT conformer model built for Marathi.

AI4Bharat NeMo:

To load, train, fine-tune or play with the model you will need to install AI4Bharat NeMo. We recommend you install it using the command shown below

git clone https://github.com/AI4Bharat/NeMo.git && cd NeMo && git checkout nemo-v2 && bash reinstall.sh

Usage

$ python inference.py --help
usage: inference.py [-h] -c CHECKPOINT -f AUDIO_FILEPATH -d (cpu,cuda) -l LANGUAGE_CODE
options:
-h, --help            show this help message and exit
-c CHECKPOINT, --checkpoint CHECKPOINT
                        Path to .nemo file
-f AUDIO_FILEPATH, --audio_filepath AUDIO_FILEPATH
                        Audio filepath
-d (cpu,cuda), --device (cpu,cuda)
                        Device (cpu/gpu)
-l LANGUAGE_CODE, --language_code LANGUAGE_CODE
                        Language Code (eg. hi)

Example command

python inference.py -c indicconformer_stt_mr_hybrid_rnnt_large.nemo -f hindi-16khz.wav -d cuda -l hi

Expected output -

Loading model..
...
Transcibing..
----------
Transcript: 
Took ** seconds.
----------

Input

This model accepts 16000 KHz Mono-channel Audio (wav files) as input.

Output

This model provides transcribed speech as a string for a given audio sample.

Model Architecture

This model is a conformer-Large model, consisting of 120M parameters, as the encoder, with a hybrid CTC-RNNT decoder. The model has 17 conformer blocks with 512 as the model dimension.