|
--- |
|
license: mit |
|
language: |
|
- hi |
|
pipeline_tag: automatic-speech-recognition |
|
library_name: nemo |
|
--- |
|
## IndicConformer |
|
|
|
IndicConformer is a Hybrid RNNT conformer model built for Hindi. |
|
|
|
## AI4Bharat NeMo: |
|
|
|
To load, train, fine-tune or play with the model you will need to install [AI4Bharat NeMo](https://github.com/AI4Bharat/NeMo). We recommend you install it using the command shown below |
|
``` |
|
git clone https://github.com/AI4Bharat/NeMo.git && cd NeMo && git checkout nemo-v2 && bash reinstall.sh |
|
``` |
|
|
|
## Usage |
|
|
|
```bash |
|
$ python inference.py --help |
|
usage: inference.py [-h] -c CHECKPOINT -f AUDIO_FILEPATH -d (cpu,cuda) -l LANGUAGE_CODE |
|
options: |
|
-h, --help show this help message and exit |
|
-c CHECKPOINT, --checkpoint CHECKPOINT |
|
Path to .nemo file |
|
-f AUDIO_FILEPATH, --audio_filepath AUDIO_FILEPATH |
|
Audio filepath |
|
-d (cpu,cuda), --device (cpu,cuda) |
|
Device (cpu/gpu) |
|
-l LANGUAGE_CODE, --language_code LANGUAGE_CODE |
|
Language Code (eg. hi) |
|
``` |
|
|
|
## Example command |
|
``` |
|
python inference.py -c indicconformer_stt_hi_hybrid_rnnt_large.nemo -f hindi-16khz.wav -d cuda -l hi |
|
``` |
|
Expected output - |
|
|
|
``` |
|
Loading model.. |
|
... |
|
Transcibing.. |
|
---------- |
|
Transcript: |
|
Took ** seconds. |
|
---------- |
|
``` |
|
|
|
### Input |
|
|
|
This model accepts 16000 KHz Mono-channel Audio (wav files) as input. |
|
|
|
### Output |
|
|
|
This model provides transcribed speech as a string for a given audio sample. |
|
|
|
## Model Architecture |
|
|
|
This model is a conformer-Large model, consisting of 120M parameters, as the encoder, with a hybrid CTC-RNNT decoder. The model has 17 conformer blocks with |
|
512 as the model dimension. |
|
|