parthiv11 commited on
Commit
0e505b6
1 Parent(s): d8a6e4e

Upload config

Browse files
Files changed (1) hide show
  1. readme_template.md +107 -0
readme_template.md ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: cc-by-4.0
5
+ library_name: nemo
6
+ datasets:
7
+ - ULCA
8
+ - KathBath
9
+ - Shrutilipi-
10
+ - MUCS
11
+ thumbnail: null
12
+ tags:
13
+ - automatic-speech-recognition
14
+ - speech
15
+ - audio
16
+ - hindi
17
+ - ai4bharat
18
+ - CTC
19
+ - Conformer
20
+ - Transformer
21
+ - NeMo
22
+ - pytorch
23
+ model-index:
24
+ - name: stt_hi_conformer_ctc_large_v2
25
+ results: []
26
+
27
+ ---
28
+
29
+
30
+ ## Model Overview
31
+
32
+ <DESCRIBE IN ONE LINE THE MODEL AND ITS USE>
33
+
34
+ ## NVIDIA NeMo: Training
35
+
36
+ To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version.
37
+ ```
38
+ pip install nemo_toolkit['all']
39
+ ```
40
+
41
+ ## How to Use this Model
42
+
43
+ The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
44
+
45
+ ### Automatically instantiate the model
46
+
47
+ ```python
48
+ import nemo.collections.asr as nemo_asr
49
+ asr_model = nemo_asr.models.ASRModel.from_pretrained("parthiv11/stt_hi_conformer_ctc_large_v2")
50
+ ```
51
+
52
+ ### Transcribing using Python
53
+ First, let's get a sample
54
+ ```
55
+ wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav
56
+ ```
57
+ Then simply do:
58
+ ```
59
+ asr_model.transcribe(['2086-149220-0033.wav'])
60
+ ```
61
+
62
+ ### Transcribing many audio files
63
+
64
+ ```shell
65
+ python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py pretrained_name="parthiv11/stt_hi_conformer_ctc_large_v2" audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
66
+ ```
67
+
68
+ ### Input
69
+
70
+ This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
71
+
72
+ ### Output
73
+
74
+ This model provides transcribed speech as a string for a given audio sample.
75
+
76
+ ## Model Architecture
77
+
78
+ <ADD SOME INFORMATION ABOUT THE ARCHITECTURE>
79
+
80
+ ## Training
81
+
82
+ <ADD INFORMATION ABOUT HOW THE MODEL WAS TRAINED - HOW MANY EPOCHS, AMOUNT OF COMPUTE ETC>
83
+
84
+ ### Datasets
85
+
86
+ <LIST THE NAME AND SPLITS OF DATASETS USED TO TRAIN THIS MODEL (ALONG WITH LANGUAGE AND ANY ADDITIONAL INFORMATION)>
87
+
88
+ ## Performance
89
+
90
+ <LIST THE SCORES OF THE MODEL -
91
+ OR
92
+ USE THE Hugging Face Evaluate LiBRARY TO UPLOAD METRICS>
93
+
94
+ ## Limitations
95
+
96
+ <DECLARE ANY POTENTIAL LIMITATIONS OF THE MODEL>
97
+
98
+ Eg:
99
+ Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
100
+
101
+
102
+ ## References
103
+
104
+ <ADD ANY REFERENCES HERE AS NEEDED>
105
+
106
+ [1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
107
+