kaushal98b's picture
Update README.md
baba7fe
|
raw
history blame
1.75 kB
---
license: mit
language:
- hi
pipeline_tag: automatic-speech-recognition
library_name: nemo
---
## IndicConformer
IndicConformer is a Hybrid RNNT conformer model built for Hindi.
## AI4Bharat NeMo:
To load, train, fine-tune or play with the model you will need to install [AI4Bharat NeMo](https://github.com/AI4Bharat/NeMo). We recommend you install it using the command shown below
```
git clone https://github.com/AI4Bharat/NeMo.git && cd NeMo && git checkout nemo-v2 && bash reinstall.sh
```
## Usage
```bash
$ python inference.py --help
usage: inference.py [-h] -c CHECKPOINT -f AUDIO_FILEPATH -d (cpu,cuda) -l LANGUAGE_CODE
options:
-h, --help show this help message and exit
-c CHECKPOINT, --checkpoint CHECKPOINT
Path to .nemo file
-f AUDIO_FILEPATH, --audio_filepath AUDIO_FILEPATH
Audio filepath
-d (cpu,cuda), --device (cpu,cuda)
Device (cpu/gpu)
-l LANGUAGE_CODE, --language_code LANGUAGE_CODE
Language Code (eg. hi)
```
## Example command
```
python inference.py -c indicconformer_stt_hi_hybrid_rnnt_large.nemo -f hindi-16khz.wav -d cuda -l hi
```
Expected output -
```
Loading model..
...
Transcibing..
----------
Transcript:
Took ** seconds.
----------
```
### Input
This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
### Output
This model provides transcribed speech as a string for a given audio sample.
## Model Architecture
This model is a conformer-Large model, consisting of 120M parameters, as the encoder, with a hybrid CTC-RNNT decoder. The model has 17 conformer blocks with
512 as the model dimension.