alcray commited on
Commit
d53fe3b
1 Parent(s): 8acc2e3

Add stt_hy_fastconformer_hybrid_large_pc model

Browse files

Signed-off-by: Alexan <[email protected]>

.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.nemo filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,150 @@
1
- ---
2
  license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  license: cc-by-4.0
2
+
3
+ datasets:
4
+ - mozilla-foundation/common_voice_17_0
5
+ - google/fleurs
6
+
7
+ language:
8
+ - hy
9
+
10
+ pipeline_tag: automatic-speech-recognition
11
+
12
+ library_name: NeMo
13
+
14
+ metrics:
15
+ - WER
16
+ - CER
17
+
18
+ tags:
19
+ - speech-recognition
20
+ - ASR
21
+ - Armenian
22
+ - Conformer
23
+ - Transducer
24
+ - CTC
25
+ - NeMo
26
+ - hf-asr-leaderboard
27
+ - speech
28
+ - audio
29
+
30
+ model-index:
31
+ - name: stt_hy_fastconformer_hybrid_large_pc
32
+ results:
33
+ - task:
34
+ name: Automatic Speech Recognition
35
+ type: automatic-speech-recognition
36
+ dataset:
37
+ name: MCV17
38
+ type: mozilla-foundation/common_voice_17_0
39
+ split: test
40
+ args:
41
+ language: hy
42
+ metrics:
43
+ - name: Test WER
44
+ type: wer
45
+ value: 9.90
46
+ - task:
47
+ name: Automatic Speech Recognition
48
+ type: automatic-speech-recognition
49
+ dataset:
50
+ name: FLEURS
51
+ type: google/fleurs
52
+ split: test
53
+ args:
54
+ language: hy
55
+ metrics:
56
+ - name: Test WER
57
+ type: wer
58
+ value: 12.32
59
+
60
+ model-details:
61
+ name: NVIDIA FastConformer-Hybrid Large (hy)
62
+ description: |
63
+ This model transcribes speech in the Armenian language with capitalization and punctuation marks support. It is a "large" version of the FastConformer Transducer-CTC model with 115M parameters, trained on Transducer (default) and CTC losses.
64
+ license: cc-by-4.0
65
+ architecture: FastConformer-Hybrid
66
+ tokenizer:
67
+ type: SentencePiece
68
+ vocab_size: 1024
69
+
70
+ inputs:
71
+ type: audio
72
+ format: wav
73
+ properties:
74
+ - 16000 Hz Mono-channel Audio
75
+ - Pre-Processing Not Needed
76
+
77
+ outputs:
78
+ type: text
79
+ format: string
80
+ properties:
81
+ - Armenian text with punctuation and capitalization
82
+ - May need inverse text normalization
83
+ - Does not handle special characters
84
+
85
+ limitations:
86
+ - Non-streaming model
87
+ - Accuracy depends on input audio characteristics
88
+ - Not recommended for word-for-word transcription
89
+ - Limited domain-specific vocabulary
90
+
91
+ usage:
92
+ framework: NeMo
93
+ pre-trained-model: nvidia/stt_hy_fastconformer_hybrid_large_pc
94
+ code:
95
+ - import nemo.collections.asr as nemo_asr
96
+ - asr_model = nemo_asr.models.EncDecHybridRNNTCTCBPEModel.from_pretrained(model_name="nvidia/stt_hy_fastconformer_hybrid_large_pc")
97
+ - asr_model.transcribe(['your_audio_file.wav'])
98
+
99
+ training:
100
+ epochs: 200
101
+ dataset:
102
+ total_hours: 296.19
103
+ sources:
104
+ - Mozilla Common Voice 17.0 (48h)
105
+ - Google Fleurs (12h)
106
+ - ArmenianGrqaserAudioBooks (21.96h)
107
+ - Proprietary Corpus 1 (69.23h)
108
+ - Proprietary Corpus 2 (145h)
109
+
110
+ evaluation:
111
+ datasets:
112
+ - Mozilla Common Voice 17.0
113
+ - Google Fleurs
114
+ - Proprietary Corpus 1
115
+ metrics:
116
+ WER:
117
+ - MCV Test WER: 9.90
118
+ - FLEURS Test WER: 12.32
119
+ CER: Not provided
120
+
121
+ deployment:
122
+ hardware:
123
+ - NVIDIA Ampere
124
+ - NVIDIA Blackwell
125
+ - NVIDIA Jetson
126
+ - NVIDIA Hopper
127
+ - NVIDIA Lovelace
128
+ - NVIDIA Pascal
129
+ - NVIDIA Turing
130
+ - NVIDIA Volta
131
+ runtime: NeMo 2.0.0
132
+ os: Linux
133
+
134
+ ethical-considerations:
135
+ trustworthy-ai:
136
+ considerations: Ensure model meets requirements for relevant industries and addresses misuse.
137
+ explainability:
138
+ application: Automatic Speech Recognition
139
+ performance:
140
+ - WER
141
+ - CER
142
+ - Real-Time Factor
143
+ risks:
144
+ - Accuracy may vary with input characteristics.
145
+ privacy:
146
+ compliance: Reviewed for privacy laws
147
+ personal-data: No identifiable personal data
148
+ safety:
149
+ use-cases: Not applicable for life-critical applications.
150
+ noise-sensitivity: Sensitive to noise and input variations.
stt_hy_fastconformer_hybrid_large_pc.nemo ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b75fb9203d48c1a50db3ab6890df0f3d85086a14d9547940adac69c1deaa20eb
3
+ size 459243520