ai4bharat
/

indicconformer_stt_hi_hybrid_ctc_rnnt_large

Automatic Speech Recognition

Model card Files Files and versions Community

indicconformer_stt_hi_hybrid_ctc_rnnt_large / README.md

kaushal98b's picture

Update README.md

baba7fe 5 months ago

|

1.75 kB

	---
	license: mit
	language:
	- hi
	pipeline_tag: automatic-speech-recognition
	library_name: nemo
	---
	## IndicConformer

	IndicConformer is a Hybrid RNNT conformer model built for Hindi.

	## AI4Bharat NeMo:

	To load, train, fine-tune or play with the model you will need to install [AI4Bharat NeMo](https://github.com/AI4Bharat/NeMo). We recommend you install it using the command shown below
	```
	git clone https://github.com/AI4Bharat/NeMo.git && cd NeMo && git checkout nemo-v2 && bash reinstall.sh
	```

	## Usage

	```bash
	$ python inference.py --help
	usage: inference.py [-h] -c CHECKPOINT -f AUDIO_FILEPATH -d (cpu,cuda) -l LANGUAGE_CODE
	options:
	-h, --help show this help message and exit
	-c CHECKPOINT, --checkpoint CHECKPOINT
	Path to .nemo file
	-f AUDIO_FILEPATH, --audio_filepath AUDIO_FILEPATH
	Audio filepath
	-d (cpu,cuda), --device (cpu,cuda)
	Device (cpu/gpu)
	-l LANGUAGE_CODE, --language_code LANGUAGE_CODE
	Language Code (eg. hi)
	```

	## Example command
	```
	python inference.py -c indicconformer_stt_hi_hybrid_rnnt_large.nemo -f hindi-16khz.wav -d cuda -l hi
	```
	Expected output -

	```
	Loading model..
	...
	Transcibing..
	----------
	Transcript:
	Took ** seconds.
	----------
	```

	### Input

	This model accepts 16000 KHz Mono-channel Audio (wav files) as input.

	### Output

	This model provides transcribed speech as a string for a given audio sample.

	## Model Architecture

	This model is a conformer-Large model, consisting of 120M parameters, as the encoder, with a hybrid CTC-RNNT decoder. The model has 17 conformer blocks with
	512 as the model dimension.