Automatic Speech Recognition
ESPnet
Guanano
audio
akreal commited on
Commit
08de618
1 Parent(s): c09853e

Update model

Browse files
README.md CHANGED
@@ -1,3 +1,390 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: gvc
7
+ datasets:
8
+ - americasnlp22
9
+ license: cc-by-4.0
10
  ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `espnet/americasnlp22-asr-gvc`
15
+
16
+ This model was trained by Pavel Denisov using americasnlp22 recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ ```bash
21
+ cd espnet
22
+ git checkout 66ca5df9f08b6084dbde4d9f312fa8ba0a47ecfc
23
+ pip install -e .
24
+ cd egs2/americasnlp22/asr1
25
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/americasnlp22-asr-gvc
26
+ ```
27
+
28
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
29
+ # RESULTS
30
+ ## Environments
31
+ - date: `Sun Jun 5 03:29:33 CEST 2022`
32
+ - python version: `3.9.13 (main, May 18 2022, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-2)]`
33
+ - espnet version: `espnet 202204`
34
+ - pytorch version: `pytorch 1.11.0+cu115`
35
+ - Git hash: `d55704daa36d3dd2ca24ae3162ac40d81957208c`
36
+ - Commit date: `Wed Jun 1 02:33:09 2022 +0200`
37
+
38
+ ## asr_train_asr_transformer_raw_gvc_bpe100_sp
39
+ ### WER
40
+
41
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
42
+ |---|---|---|---|---|---|---|---|---|
43
+ |decode_asr_asr_model_valid.cer_ctc.best/dev_gvc|253|2206|12.4|72.4|15.1|6.7|94.2|99.6|
44
+
45
+ ### CER
46
+
47
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
48
+ |---|---|---|---|---|---|---|---|---|
49
+ |decode_asr_asr_model_valid.cer_ctc.best/dev_gvc|253|13453|64.7|15.5|19.9|10.2|45.6|99.6|
50
+
51
+ ### TER
52
+
53
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
54
+ |---|---|---|---|---|---|---|---|---|
55
+ |decode_asr_asr_model_valid.cer_ctc.best/dev_gvc|253|10229|58.3|22.3|19.4|11.0|52.7|99.6|
56
+
57
+ ## ASR config
58
+
59
+ <details><summary>expand</summary>
60
+
61
+ ```
62
+ config: conf/train_asr_transformer.yaml
63
+ print_config: false
64
+ log_level: INFO
65
+ dry_run: false
66
+ iterator_type: sequence
67
+ output_dir: exp/asr_train_asr_transformer_raw_gvc_bpe100_sp
68
+ ngpu: 1
69
+ seed: 0
70
+ num_workers: 1
71
+ num_att_plot: 3
72
+ dist_backend: nccl
73
+ dist_init_method: env://
74
+ dist_world_size: null
75
+ dist_rank: null
76
+ local_rank: 0
77
+ dist_master_addr: null
78
+ dist_master_port: null
79
+ dist_launcher: null
80
+ multiprocessing_distributed: false
81
+ unused_parameters: false
82
+ sharded_ddp: false
83
+ cudnn_enabled: true
84
+ cudnn_benchmark: false
85
+ cudnn_deterministic: true
86
+ collect_stats: false
87
+ write_collected_feats: false
88
+ max_epoch: 15
89
+ patience: null
90
+ val_scheduler_criterion:
91
+ - valid
92
+ - loss
93
+ early_stopping_criterion:
94
+ - valid
95
+ - loss
96
+ - min
97
+ best_model_criterion:
98
+ - - valid
99
+ - cer_ctc
100
+ - min
101
+ keep_nbest_models: 1
102
+ nbest_averaging_interval: 0
103
+ grad_clip: 5.0
104
+ grad_clip_type: 2.0
105
+ grad_noise: false
106
+ accum_grad: 1
107
+ no_forward_run: false
108
+ resume: true
109
+ train_dtype: float32
110
+ use_amp: false
111
+ log_interval: null
112
+ use_matplotlib: true
113
+ use_tensorboard: true
114
+ use_wandb: false
115
+ wandb_project: null
116
+ wandb_id: null
117
+ wandb_entity: null
118
+ wandb_name: null
119
+ wandb_model_log_interval: -1
120
+ detect_anomaly: false
121
+ pretrain_path: null
122
+ init_param: []
123
+ ignore_init_mismatch: false
124
+ freeze_param:
125
+ - frontend.upstream.model.feature_extractor
126
+ - frontend.upstream.model.encoder.layers.0
127
+ - frontend.upstream.model.encoder.layers.1
128
+ - frontend.upstream.model.encoder.layers.2
129
+ - frontend.upstream.model.encoder.layers.3
130
+ - frontend.upstream.model.encoder.layers.4
131
+ - frontend.upstream.model.encoder.layers.5
132
+ - frontend.upstream.model.encoder.layers.6
133
+ - frontend.upstream.model.encoder.layers.7
134
+ - frontend.upstream.model.encoder.layers.8
135
+ - frontend.upstream.model.encoder.layers.9
136
+ - frontend.upstream.model.encoder.layers.10
137
+ - frontend.upstream.model.encoder.layers.11
138
+ - frontend.upstream.model.encoder.layers.12
139
+ - frontend.upstream.model.encoder.layers.13
140
+ - frontend.upstream.model.encoder.layers.14
141
+ - frontend.upstream.model.encoder.layers.15
142
+ - frontend.upstream.model.encoder.layers.16
143
+ - frontend.upstream.model.encoder.layers.17
144
+ - frontend.upstream.model.encoder.layers.18
145
+ - frontend.upstream.model.encoder.layers.19
146
+ - frontend.upstream.model.encoder.layers.20
147
+ - frontend.upstream.model.encoder.layers.21
148
+ num_iters_per_epoch: null
149
+ batch_size: 20
150
+ valid_batch_size: null
151
+ batch_bins: 200000
152
+ valid_batch_bins: null
153
+ train_shape_file:
154
+ - exp/asr_stats_raw_gvc_bpe100_sp/train/speech_shape
155
+ - exp/asr_stats_raw_gvc_bpe100_sp/train/text_shape.bpe
156
+ valid_shape_file:
157
+ - exp/asr_stats_raw_gvc_bpe100_sp/valid/speech_shape
158
+ - exp/asr_stats_raw_gvc_bpe100_sp/valid/text_shape.bpe
159
+ batch_type: numel
160
+ valid_batch_type: null
161
+ fold_length:
162
+ - 80000
163
+ - 150
164
+ sort_in_batch: descending
165
+ sort_batch: descending
166
+ multiple_iterator: false
167
+ chunk_length: 500
168
+ chunk_shift_ratio: 0.5
169
+ num_cache_chunks: 1024
170
+ train_data_path_and_name_and_type:
171
+ - - dump/raw/train_gvc_sp/wav.scp
172
+ - speech
173
+ - sound
174
+ - - dump/raw/train_gvc_sp/text
175
+ - text
176
+ - text
177
+ valid_data_path_and_name_and_type:
178
+ - - dump/raw/dev_gvc/wav.scp
179
+ - speech
180
+ - sound
181
+ - - dump/raw/dev_gvc/text
182
+ - text
183
+ - text
184
+ allow_variable_data_keys: false
185
+ max_cache_size: 0.0
186
+ max_cache_fd: 32
187
+ valid_max_cache_size: null
188
+ optim: adamw
189
+ optim_conf:
190
+ lr: 0.0001
191
+ scheduler: warmuplr
192
+ scheduler_conf:
193
+ warmup_steps: 300
194
+ token_list:
195
+ - <blank>
196
+ - <unk>
197
+ - ▁
198
+ - a
199
+ - ''''
200
+ - u
201
+ - i
202
+ - o
203
+ - h
204
+ - U
205
+ - .
206
+ - ro
207
+ - re
208
+ - ri
209
+ - ka
210
+ - s
211
+ - na
212
+ - p
213
+ - e
214
+ - ▁ti
215
+ - t
216
+ - ':'
217
+ - d
218
+ - ha
219
+ - 'no'
220
+ - ▁hi
221
+ - m
222
+ - ▁ni
223
+ - '~'
224
+ - ã
225
+ - ta
226
+ - ▁wa
227
+ - ti
228
+ - ','
229
+ - ▁to
230
+ - b
231
+ - n
232
+ - ▁kh
233
+ - ma
234
+ - r
235
+ - se
236
+ - w
237
+ - l
238
+ - k
239
+ - '"'
240
+ - ñ
241
+ - õ
242
+ - g
243
+ - (
244
+ - )
245
+ - v
246
+ - f
247
+ - '?'
248
+ - A
249
+ - K
250
+ - z
251
+ - é
252
+ - T
253
+ - '!'
254
+ - D
255
+ - ó
256
+ - N
257
+ - á
258
+ - R
259
+ - P
260
+ - ú
261
+ - '0'
262
+ - í
263
+ - I
264
+ - '1'
265
+ - L
266
+ - '-'
267
+ - '8'
268
+ - E
269
+ - S
270
+ - Ã
271
+ - F
272
+ - '9'
273
+ - '6'
274
+ - G
275
+ - C
276
+ - x
277
+ - '3'
278
+ - '2'
279
+ - B
280
+ - W
281
+ - J
282
+ - H
283
+ - Y
284
+ - M
285
+ - j
286
+ - ç
287
+ - q
288
+ - c
289
+ - Ñ
290
+ - '4'
291
+ - '7'
292
+ - O
293
+ - y
294
+ - <sos/eos>
295
+ init: null
296
+ input_size: null
297
+ ctc_conf:
298
+ dropout_rate: 0.0
299
+ ctc_type: builtin
300
+ reduce: true
301
+ ignore_nan_grad: true
302
+ joint_net_conf: null
303
+ use_preprocessor: true
304
+ token_type: bpe
305
+ bpemodel: data/gvc_token_list/bpe_unigram100/bpe.model
306
+ non_linguistic_symbols: null
307
+ cleaner: null
308
+ g2p: null
309
+ speech_volume_normalize: null
310
+ rir_scp: null
311
+ rir_apply_prob: 1.0
312
+ noise_scp: null
313
+ noise_apply_prob: 1.0
314
+ noise_db_range: '13_15'
315
+ frontend: s3prl
316
+ frontend_conf:
317
+ frontend_conf:
318
+ upstream: wav2vec2_url
319
+ upstream_ckpt: https://dl.fbaipublicfiles.com/fairseq/wav2vec/xlsr2_300m.pt
320
+ download_dir: ./hub
321
+ multilayer_feature: true
322
+ fs: 16k
323
+ specaug: null
324
+ specaug_conf: {}
325
+ normalize: utterance_mvn
326
+ normalize_conf: {}
327
+ model: espnet
328
+ model_conf:
329
+ ctc_weight: 1.0
330
+ lsm_weight: 0.0
331
+ length_normalized_loss: false
332
+ extract_feats_in_collect_stats: false
333
+ preencoder: linear
334
+ preencoder_conf:
335
+ input_size: 1024
336
+ output_size: 80
337
+ encoder: transformer
338
+ encoder_conf:
339
+ input_layer: conv2d2
340
+ num_blocks: 1
341
+ linear_units: 2048
342
+ dropout_rate: 0.2
343
+ output_size: 256
344
+ attention_heads: 8
345
+ attention_dropout_rate: 0.2
346
+ postencoder: null
347
+ postencoder_conf: {}
348
+ decoder: rnn
349
+ decoder_conf: {}
350
+ required:
351
+ - output_dir
352
+ - token_list
353
+ version: '202204'
354
+ distributed: false
355
+ ```
356
+
357
+ </details>
358
+
359
+
360
+
361
+ ### Citing ESPnet
362
+
363
+ ```BibTex
364
+ @inproceedings{watanabe2018espnet,
365
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
366
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
367
+ year={2018},
368
+ booktitle={Proceedings of Interspeech},
369
+ pages={2207--2211},
370
+ doi={10.21437/Interspeech.2018-1456},
371
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
372
+ }
373
+
374
+
375
+
376
+
377
+ ```
378
+
379
+ or arXiv:
380
+
381
+ ```bibtex
382
+ @misc{watanabe2018espnet,
383
+ title={ESPnet: End-to-End Speech Processing Toolkit},
384
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
385
+ year={2018},
386
+ eprint={1804.00015},
387
+ archivePrefix={arXiv},
388
+ primaryClass={cs.CL}
389
+ }
390
+ ```
data/gvc_token_list/bpe_unigram100/bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aad1e7ccd45ec389cd5d4be505b40795e681886104e335884e8dccc304253669
3
+ size 238744
exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/6epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8c7a2941722f353b32fdf18931e3d6c2b6eedc2390e281a70068aadff12c97c6
3
+ size 1287519213
exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/RESULTS.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Sun Jun 5 03:29:33 CEST 2022`
5
+ - python version: `3.9.13 (main, May 18 2022, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-2)]`
6
+ - espnet version: `espnet 202204`
7
+ - pytorch version: `pytorch 1.11.0+cu115`
8
+ - Git hash: `d55704daa36d3dd2ca24ae3162ac40d81957208c`
9
+ - Commit date: `Wed Jun 1 02:33:09 2022 +0200`
10
+
11
+ ## asr_train_asr_transformer_raw_gvc_bpe100_sp
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_asr_asr_model_valid.cer_ctc.best/dev_gvc|253|2206|12.4|72.4|15.1|6.7|94.2|99.6|
17
+
18
+ ### CER
19
+
20
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
21
+ |---|---|---|---|---|---|---|---|---|
22
+ |decode_asr_asr_model_valid.cer_ctc.best/dev_gvc|253|13453|64.7|15.5|19.9|10.2|45.6|99.6|
23
+
24
+ ### TER
25
+
26
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
27
+ |---|---|---|---|---|---|---|---|---|
28
+ |decode_asr_asr_model_valid.cer_ctc.best/dev_gvc|253|10229|58.3|22.3|19.4|11.0|52.7|99.6|
29
+
exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/config.yaml ADDED
@@ -0,0 +1,293 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/train_asr_transformer.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_train_asr_transformer_raw_gvc_bpe100_sp
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 1
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 15
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - cer_ctc
39
+ - min
40
+ keep_nbest_models: 1
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 1
46
+ no_forward_run: false
47
+ resume: true
48
+ train_dtype: float32
49
+ use_amp: false
50
+ log_interval: null
51
+ use_matplotlib: true
52
+ use_tensorboard: true
53
+ use_wandb: false
54
+ wandb_project: null
55
+ wandb_id: null
56
+ wandb_entity: null
57
+ wandb_name: null
58
+ wandb_model_log_interval: -1
59
+ detect_anomaly: false
60
+ pretrain_path: null
61
+ init_param: []
62
+ ignore_init_mismatch: false
63
+ freeze_param:
64
+ - frontend.upstream.model.feature_extractor
65
+ - frontend.upstream.model.encoder.layers.0
66
+ - frontend.upstream.model.encoder.layers.1
67
+ - frontend.upstream.model.encoder.layers.2
68
+ - frontend.upstream.model.encoder.layers.3
69
+ - frontend.upstream.model.encoder.layers.4
70
+ - frontend.upstream.model.encoder.layers.5
71
+ - frontend.upstream.model.encoder.layers.6
72
+ - frontend.upstream.model.encoder.layers.7
73
+ - frontend.upstream.model.encoder.layers.8
74
+ - frontend.upstream.model.encoder.layers.9
75
+ - frontend.upstream.model.encoder.layers.10
76
+ - frontend.upstream.model.encoder.layers.11
77
+ - frontend.upstream.model.encoder.layers.12
78
+ - frontend.upstream.model.encoder.layers.13
79
+ - frontend.upstream.model.encoder.layers.14
80
+ - frontend.upstream.model.encoder.layers.15
81
+ - frontend.upstream.model.encoder.layers.16
82
+ - frontend.upstream.model.encoder.layers.17
83
+ - frontend.upstream.model.encoder.layers.18
84
+ - frontend.upstream.model.encoder.layers.19
85
+ - frontend.upstream.model.encoder.layers.20
86
+ - frontend.upstream.model.encoder.layers.21
87
+ num_iters_per_epoch: null
88
+ batch_size: 20
89
+ valid_batch_size: null
90
+ batch_bins: 200000
91
+ valid_batch_bins: null
92
+ train_shape_file:
93
+ - exp/asr_stats_raw_gvc_bpe100_sp/train/speech_shape
94
+ - exp/asr_stats_raw_gvc_bpe100_sp/train/text_shape.bpe
95
+ valid_shape_file:
96
+ - exp/asr_stats_raw_gvc_bpe100_sp/valid/speech_shape
97
+ - exp/asr_stats_raw_gvc_bpe100_sp/valid/text_shape.bpe
98
+ batch_type: numel
99
+ valid_batch_type: null
100
+ fold_length:
101
+ - 80000
102
+ - 150
103
+ sort_in_batch: descending
104
+ sort_batch: descending
105
+ multiple_iterator: false
106
+ chunk_length: 500
107
+ chunk_shift_ratio: 0.5
108
+ num_cache_chunks: 1024
109
+ train_data_path_and_name_and_type:
110
+ - - dump/raw/train_gvc_sp/wav.scp
111
+ - speech
112
+ - sound
113
+ - - dump/raw/train_gvc_sp/text
114
+ - text
115
+ - text
116
+ valid_data_path_and_name_and_type:
117
+ - - dump/raw/dev_gvc/wav.scp
118
+ - speech
119
+ - sound
120
+ - - dump/raw/dev_gvc/text
121
+ - text
122
+ - text
123
+ allow_variable_data_keys: false
124
+ max_cache_size: 0.0
125
+ max_cache_fd: 32
126
+ valid_max_cache_size: null
127
+ optim: adamw
128
+ optim_conf:
129
+ lr: 0.0001
130
+ scheduler: warmuplr
131
+ scheduler_conf:
132
+ warmup_steps: 300
133
+ token_list:
134
+ - <blank>
135
+ - <unk>
136
+ - ▁
137
+ - a
138
+ - ''''
139
+ - u
140
+ - i
141
+ - o
142
+ - h
143
+ - U
144
+ - .
145
+ - ro
146
+ - re
147
+ - ri
148
+ - ka
149
+ - s
150
+ - na
151
+ - p
152
+ - e
153
+ - ▁ti
154
+ - t
155
+ - ':'
156
+ - d
157
+ - ha
158
+ - 'no'
159
+ - ▁hi
160
+ - m
161
+ - ▁ni
162
+ - '~'
163
+ - ã
164
+ - ta
165
+ - ▁wa
166
+ - ti
167
+ - ','
168
+ - ▁to
169
+ - b
170
+ - n
171
+ - ▁kh
172
+ - ma
173
+ - r
174
+ - se
175
+ - w
176
+ - l
177
+ - k
178
+ - '"'
179
+ - ñ
180
+ - õ
181
+ - g
182
+ - (
183
+ - )
184
+ - v
185
+ - f
186
+ - '?'
187
+ - A
188
+ - K
189
+ - z
190
+ - é
191
+ - T
192
+ - '!'
193
+ - D
194
+ - ó
195
+ - N
196
+ - á
197
+ - R
198
+ - P
199
+ - ú
200
+ - '0'
201
+ - í
202
+ - I
203
+ - '1'
204
+ - L
205
+ - '-'
206
+ - '8'
207
+ - E
208
+ - S
209
+ - Ã
210
+ - F
211
+ - '9'
212
+ - '6'
213
+ - G
214
+ - C
215
+ - x
216
+ - '3'
217
+ - '2'
218
+ - B
219
+ - W
220
+ - J
221
+ - H
222
+ - Y
223
+ - M
224
+ - j
225
+ - ç
226
+ - q
227
+ - c
228
+ - Ñ
229
+ - '4'
230
+ - '7'
231
+ - O
232
+ - y
233
+ - <sos/eos>
234
+ init: null
235
+ input_size: null
236
+ ctc_conf:
237
+ dropout_rate: 0.0
238
+ ctc_type: builtin
239
+ reduce: true
240
+ ignore_nan_grad: true
241
+ joint_net_conf: null
242
+ use_preprocessor: true
243
+ token_type: bpe
244
+ bpemodel: data/gvc_token_list/bpe_unigram100/bpe.model
245
+ non_linguistic_symbols: null
246
+ cleaner: null
247
+ g2p: null
248
+ speech_volume_normalize: null
249
+ rir_scp: null
250
+ rir_apply_prob: 1.0
251
+ noise_scp: null
252
+ noise_apply_prob: 1.0
253
+ noise_db_range: '13_15'
254
+ frontend: s3prl
255
+ frontend_conf:
256
+ frontend_conf:
257
+ upstream: wav2vec2_url
258
+ upstream_ckpt: https://dl.fbaipublicfiles.com/fairseq/wav2vec/xlsr2_300m.pt
259
+ download_dir: ./hub
260
+ multilayer_feature: true
261
+ fs: 16k
262
+ specaug: null
263
+ specaug_conf: {}
264
+ normalize: utterance_mvn
265
+ normalize_conf: {}
266
+ model: espnet
267
+ model_conf:
268
+ ctc_weight: 1.0
269
+ lsm_weight: 0.0
270
+ length_normalized_loss: false
271
+ extract_feats_in_collect_stats: false
272
+ preencoder: linear
273
+ preencoder_conf:
274
+ input_size: 1024
275
+ output_size: 80
276
+ encoder: transformer
277
+ encoder_conf:
278
+ input_layer: conv2d2
279
+ num_blocks: 1
280
+ linear_units: 2048
281
+ dropout_rate: 0.2
282
+ output_size: 256
283
+ attention_heads: 8
284
+ attention_dropout_rate: 0.2
285
+ postencoder: null
286
+ postencoder_conf: {}
287
+ decoder: rnn
288
+ decoder_conf: {}
289
+ required:
290
+ - output_dir
291
+ - token_list
292
+ version: '202204'
293
+ distributed: false
exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/images/acc.png ADDED
exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/images/backward_time.png ADDED
exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/images/cer.png ADDED
exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/images/cer_ctc.png ADDED
exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/images/forward_time.png ADDED
exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/images/iter_time.png ADDED
exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/images/loss.png ADDED
exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/images/loss_att.png ADDED
exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/images/loss_ctc.png ADDED
exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/images/optim0_lr0.png ADDED
exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/images/optim_step_time.png ADDED
exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/images/train_time.png ADDED
exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/images/wer.png ADDED
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202204'
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/6epoch.pth
4
+ python: "3.9.13 (main, May 18 2022, 00:00:00) \n[GCC 11.3.1 20220421 (Red Hat 11.3.1-2)]"
5
+ timestamp: 1654392574.992199
6
+ torch: 1.11.0+cu115
7
+ yaml_files:
8
+ asr_train_config: exp/asr_train_asr_transformer_raw_gvc_bpe100_sp/config.yaml