akreal commited on
Commit
365235a
·
unverified ·
1 Parent(s): 537e2bb

Update model

Browse files
README.md CHANGED
@@ -1,3 +1,423 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - libritts
9
+ license: cc-by-4.0
10
  ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `espnet/akreal_libritts_asr_phn`
15
+
16
+ This model was trained by Pavel Denisov using libritts recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+ git checkout 3800a13ae8972839d506b85585c41e6b24daf812
26
+ pip install -e .
27
+ cd egs2/libritts/asr1
28
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/akreal_libritts_asr_phn
29
+ ```
30
+
31
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
32
+ # RESULTS
33
+ ## Environments
34
+ - date: `Sat Oct 14 03:02:09 CEST 2023`
35
+ - python version: `3.10.8 (main, Nov 14 2022, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-3)]`
36
+ - espnet version: `espnet 202308`
37
+ - pytorch version: `pytorch 2.0.1+cu118`
38
+ - Git hash: `3800a13ae8972839d506b85585c41e6b24daf812`
39
+ - Commit date: `Sun Oct 8 17:51:17 2023 +0200`
40
+
41
+ ## exp/asr_train_asr_raw_en_bpe100_sp
42
+ ### WER
43
+
44
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
45
+ |---|---|---|---|---|---|---|---|---|
46
+ |decode_asr_asr_model_valid.acc.ave/dev-clean|5736|95872|91.7|8.0|0.4|0.8|9.1|67.0|
47
+ |decode_asr_asr_model_valid.acc.ave/dev-other|4613|69577|88.5|10.9|0.6|1.2|12.7|74.2|
48
+ |decode_asr_asr_model_valid.acc.ave/test-clean|4837|87078|91.4|8.2|0.4|0.8|9.4|70.4|
49
+ |decode_asr_asr_model_valid.acc.ave/test-other|5120|72541|87.0|12.2|0.8|1.1|14.1|77.1|
50
+
51
+ ### CER
52
+
53
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
54
+ |---|---|---|---|---|---|---|---|---|
55
+ |decode_asr_asr_model_valid.acc.ave/dev-clean|5736|570710|98.4|0.8|0.9|0.6|2.2|67.1|
56
+ |decode_asr_asr_model_valid.acc.ave/dev-other|4613|414781|97.2|1.6|1.2|1.0|3.8|74.2|
57
+ |decode_asr_asr_model_valid.acc.ave/test-clean|4837|530647|98.5|0.7|0.8|0.6|2.2|70.5|
58
+ |decode_asr_asr_model_valid.acc.ave/test-other|5120|429463|96.7|1.7|1.6|1.0|4.3|77.1|
59
+
60
+ ### TER
61
+
62
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
63
+ |---|---|---|---|---|---|---|---|---|
64
+ |decode_asr_asr_model_valid.acc.ave/dev-clean|5736|433548|97.6|1.4|1.0|0.6|3.0|67.1|
65
+ |decode_asr_asr_model_valid.acc.ave/dev-other|4613|316550|96.1|2.5|1.4|1.0|5.0|74.2|
66
+ |decode_asr_asr_model_valid.acc.ave/test-clean|4837|404031|97.7|1.4|0.9|0.7|2.9|70.5|
67
+ |decode_asr_asr_model_valid.acc.ave/test-other|5120|327248|95.4|2.8|1.8|1.1|5.7|77.1|
68
+
69
+ ## ASR config
70
+
71
+ <details><summary>expand</summary>
72
+
73
+ ```
74
+ config: conf/train_asr.yaml
75
+ print_config: false
76
+ log_level: INFO
77
+ drop_last_iter: false
78
+ dry_run: false
79
+ iterator_type: sequence
80
+ valid_iterator_type: null
81
+ output_dir: exp/asr_train_asr_raw_en_bpe100_sp
82
+ ngpu: 1
83
+ seed: 0
84
+ num_workers: 1
85
+ num_att_plot: 3
86
+ dist_backend: nccl
87
+ dist_init_method: env://
88
+ dist_world_size: 2
89
+ dist_rank: 0
90
+ local_rank: 0
91
+ dist_master_addr: localhost
92
+ dist_master_port: 58465
93
+ dist_launcher: null
94
+ multiprocessing_distributed: true
95
+ unused_parameters: true
96
+ sharded_ddp: false
97
+ cudnn_enabled: true
98
+ cudnn_benchmark: false
99
+ cudnn_deterministic: true
100
+ collect_stats: false
101
+ write_collected_feats: false
102
+ max_epoch: 70
103
+ patience: null
104
+ val_scheduler_criterion:
105
+ - valid
106
+ - loss
107
+ early_stopping_criterion:
108
+ - valid
109
+ - loss
110
+ - min
111
+ best_model_criterion:
112
+ - - valid
113
+ - acc
114
+ - max
115
+ keep_nbest_models: 10
116
+ nbest_averaging_interval: 0
117
+ grad_clip: 5.0
118
+ grad_clip_type: 2.0
119
+ grad_noise: false
120
+ accum_grad: 8
121
+ no_forward_run: false
122
+ resume: true
123
+ train_dtype: float32
124
+ use_amp: true
125
+ log_interval: null
126
+ use_matplotlib: true
127
+ use_tensorboard: true
128
+ create_graph_in_tensorboard: false
129
+ use_wandb: false
130
+ wandb_project: null
131
+ wandb_id: null
132
+ wandb_entity: null
133
+ wandb_name: null
134
+ wandb_model_log_interval: -1
135
+ detect_anomaly: false
136
+ pretrain_path: null
137
+ init_param: []
138
+ ignore_init_mismatch: false
139
+ freeze_param: []
140
+ num_iters_per_epoch: null
141
+ batch_size: 20
142
+ valid_batch_size: null
143
+ batch_bins: 10000000
144
+ valid_batch_bins: null
145
+ train_shape_file:
146
+ - exp/asr_stats_raw_en_bpe100_sp/train/speech_shape
147
+ - exp/asr_stats_raw_en_bpe100_sp/train/text_shape.bpe
148
+ valid_shape_file:
149
+ - exp/asr_stats_raw_en_bpe100_sp/valid/speech_shape
150
+ - exp/asr_stats_raw_en_bpe100_sp/valid/text_shape.bpe
151
+ batch_type: numel
152
+ valid_batch_type: null
153
+ fold_length:
154
+ - 80000
155
+ - 150
156
+ sort_in_batch: descending
157
+ shuffle_within_batch: false
158
+ sort_batch: descending
159
+ multiple_iterator: false
160
+ chunk_length: 500
161
+ chunk_shift_ratio: 0.5
162
+ num_cache_chunks: 1024
163
+ chunk_excluded_key_prefixes: []
164
+ train_data_path_and_name_and_type:
165
+ - - dump/raw/train-960_sp/wav.scp
166
+ - speech
167
+ - kaldi_ark
168
+ - - dump/raw/train-960_sp/text
169
+ - text
170
+ - text
171
+ valid_data_path_and_name_and_type:
172
+ - - dump/raw/dev/wav.scp
173
+ - speech
174
+ - kaldi_ark
175
+ - - dump/raw/dev/text
176
+ - text
177
+ - text
178
+ allow_variable_data_keys: false
179
+ max_cache_size: 0.0
180
+ max_cache_fd: 32
181
+ valid_max_cache_size: null
182
+ exclude_weight_decay: false
183
+ exclude_weight_decay_conf: {}
184
+ optim: adam
185
+ optim_conf:
186
+ lr: 0.005
187
+ weight_decay: 1.0e-06
188
+ scheduler: warmuplr
189
+ scheduler_conf:
190
+ warmup_steps: 40000
191
+ token_list:
192
+ - <blank>
193
+ - <unk>
194
+ - ˈ
195
+ - ː
196
+ - ▁
197
+ - ɹ
198
+ - t
199
+ - d
200
+ - ɪ
201
+ - i
202
+ - ˌ
203
+ - ɛ
204
+ - ','
205
+ - s
206
+ - l
207
+ - n
208
+ - k
209
+ - z
210
+ - m
211
+ - ▁s
212
+ - eɪ
213
+ - ʌ
214
+ - aɪ
215
+ - .
216
+ - ɔ
217
+ - æ
218
+ - ɚ
219
+ - oʊ
220
+ - ɑ
221
+ - ▁w
222
+ - ▁h
223
+ - v
224
+ - ▁b
225
+ - ▁m
226
+ - p
227
+ - u
228
+ - ə
229
+ - ▁f
230
+ - ▁k
231
+ - ▁ɐ
232
+ - ▁ðə
233
+ - ən
234
+ - ▁ð
235
+ - ▁ˈ
236
+ - ɪŋ
237
+ - ▁ænd
238
+ - ɜ
239
+ - f
240
+ - ʊ
241
+ - ▁p
242
+ - ɾ
243
+ - ▁ʌv
244
+ - ▁d
245
+ - st
246
+ - ▁tə
247
+ - ɛn
248
+ - ▁l
249
+ - aʊ
250
+ - əl
251
+ - b
252
+ - ▁n
253
+ - ʃ
254
+ - ▁t
255
+ - tʃ
256
+ - ▁ɪn
257
+ - ▁ɡ
258
+ - ðə
259
+ - ɪn
260
+ - ▁ɹ
261
+ - θ
262
+ - w
263
+ - '"'
264
+ - ▁j
265
+ - dʒ
266
+ - æn
267
+ - ▁"
268
+ - ɡ
269
+ - ð
270
+ - o
271
+ - ɐ
272
+ - j
273
+ - ŋ
274
+ - ;
275
+ - '?'
276
+ - '!'
277
+ - h
278
+ - ':'
279
+ - ʒ
280
+ - ʔ
281
+ - r
282
+ - —
283
+ - ɬ
284
+ - x
285
+ - ç
286
+ - ̩
287
+ - ᵻ
288
+ - e
289
+ - a
290
+ - ̃
291
+ - <sos/eos>
292
+ init: null
293
+ input_size: null
294
+ ctc_conf:
295
+ dropout_rate: 0.0
296
+ ctc_type: builtin
297
+ reduce: true
298
+ ignore_nan_grad: null
299
+ zero_infinity: true
300
+ joint_net_conf: null
301
+ use_preprocessor: true
302
+ token_type: bpe
303
+ bpemodel: data/en_token_list/bpe_unigram100/bpe.model
304
+ non_linguistic_symbols: null
305
+ cleaner: null
306
+ g2p: null
307
+ speech_volume_normalize: null
308
+ rir_scp: null
309
+ rir_apply_prob: 1.0
310
+ noise_scp: null
311
+ noise_apply_prob: 1.0
312
+ noise_db_range: '13_15'
313
+ short_noise_thres: 0.5
314
+ aux_ctc_tasks: []
315
+ frontend: default
316
+ frontend_conf:
317
+ n_fft: 512
318
+ hop_length: 160
319
+ fs: 16k
320
+ specaug: specaug
321
+ specaug_conf:
322
+ apply_time_warp: true
323
+ time_warp_window: 5
324
+ time_warp_mode: bicubic
325
+ apply_freq_mask: true
326
+ freq_mask_width_range:
327
+ - 0
328
+ - 27
329
+ num_freq_mask: 2
330
+ apply_time_mask: true
331
+ time_mask_width_ratio_range:
332
+ - 0.0
333
+ - 0.05
334
+ num_time_mask: 10
335
+ normalize: global_mvn
336
+ normalize_conf:
337
+ stats_file: exp/asr_stats_raw_en_bpe100_sp/train/feats_stats.npz
338
+ model: espnet
339
+ model_conf:
340
+ ctc_weight: 0.6
341
+ lsm_weight: 0.1
342
+ length_normalized_loss: false
343
+ preencoder: null
344
+ preencoder_conf: {}
345
+ encoder: e_branchformer
346
+ encoder_conf:
347
+ output_size: 512
348
+ attention_heads: 8
349
+ attention_layer_type: rel_selfattn
350
+ pos_enc_layer_type: rel_pos
351
+ rel_pos_type: latest
352
+ cgmlp_linear_units: 3072
353
+ cgmlp_conv_kernel: 31
354
+ use_linear_after_conv: false
355
+ gate_activation: identity
356
+ num_blocks: 17
357
+ dropout_rate: 0.1
358
+ positional_dropout_rate: 0.1
359
+ attention_dropout_rate: 0.1
360
+ input_layer: conv2d
361
+ layer_drop_rate: 0.1
362
+ linear_units: 1024
363
+ positionwise_layer_type: linear
364
+ macaron_ffn: true
365
+ use_ffn: true
366
+ merge_conv_kernel: 31
367
+ postencoder: null
368
+ postencoder_conf: {}
369
+ decoder: transformer
370
+ decoder_conf:
371
+ attention_heads: 8
372
+ linear_units: 2048
373
+ num_blocks: 6
374
+ dropout_rate: 0.1
375
+ positional_dropout_rate: 0.1
376
+ self_attention_dropout_rate: 0.1
377
+ src_attention_dropout_rate: 0.1
378
+ layer_drop_rate: 0.2
379
+ preprocessor: default
380
+ preprocessor_conf: {}
381
+ required:
382
+ - output_dir
383
+ - token_list
384
+ version: '202308'
385
+ distributed: true
386
+ ```
387
+
388
+ </details>
389
+
390
+
391
+
392
+ ### Citing ESPnet
393
+
394
+ ```BibTex
395
+ @inproceedings{watanabe2018espnet,
396
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
397
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
398
+ year={2018},
399
+ booktitle={Proceedings of Interspeech},
400
+ pages={2207--2211},
401
+ doi={10.21437/Interspeech.2018-1456},
402
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
403
+ }
404
+
405
+
406
+
407
+
408
+
409
+
410
+ ```
411
+
412
+ or arXiv:
413
+
414
+ ```bibtex
415
+ @misc{watanabe2018espnet,
416
+ title={ESPnet: End-to-End Speech Processing Toolkit},
417
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
418
+ year={2018},
419
+ eprint={1804.00015},
420
+ archivePrefix={arXiv},
421
+ primaryClass={cs.CL}
422
+ }
423
+ ```
data/en_token_list/bpe_unigram100/bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:13cc77b6dede0a6c3499fc7735684d17f4aabc1088fe44e816565d6d6616512e
3
+ size 238845
exp/asr_stats_raw_en_bpe100_sp/train/feats_stats.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ddbeb32cd5e426335c590012940e0a8b4ba3393a600db96d8c71e95625177f08
3
+ size 1402
exp/asr_train_asr_raw_en_bpe100_sp/RESULTS.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Sat Oct 14 03:02:09 CEST 2023`
5
+ - python version: `3.10.8 (main, Nov 14 2022, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-3)]`
6
+ - espnet version: `espnet 202308`
7
+ - pytorch version: `pytorch 2.0.1+cu118`
8
+ - Git hash: `3800a13ae8972839d506b85585c41e6b24daf812`
9
+ - Commit date: `Sun Oct 8 17:51:17 2023 +0200`
10
+
11
+ ## exp/asr_train_asr_raw_en_bpe100_sp
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_asr_asr_model_valid.acc.ave/dev-clean|5736|95872|91.7|8.0|0.4|0.8|9.1|67.0|
17
+ |decode_asr_asr_model_valid.acc.ave/dev-other|4613|69577|88.5|10.9|0.6|1.2|12.7|74.2|
18
+ |decode_asr_asr_model_valid.acc.ave/test-clean|4837|87078|91.4|8.2|0.4|0.8|9.4|70.4|
19
+ |decode_asr_asr_model_valid.acc.ave/test-other|5120|72541|87.0|12.2|0.8|1.1|14.1|77.1|
20
+
21
+ ### CER
22
+
23
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
24
+ |---|---|---|---|---|---|---|---|---|
25
+ |decode_asr_asr_model_valid.acc.ave/dev-clean|5736|570710|98.4|0.8|0.9|0.6|2.2|67.1|
26
+ |decode_asr_asr_model_valid.acc.ave/dev-other|4613|414781|97.2|1.6|1.2|1.0|3.8|74.2|
27
+ |decode_asr_asr_model_valid.acc.ave/test-clean|4837|530647|98.5|0.7|0.8|0.6|2.2|70.5|
28
+ |decode_asr_asr_model_valid.acc.ave/test-other|5120|429463|96.7|1.7|1.6|1.0|4.3|77.1|
29
+
30
+ ### TER
31
+
32
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
33
+ |---|---|---|---|---|---|---|---|---|
34
+ |decode_asr_asr_model_valid.acc.ave/dev-clean|5736|433548|97.6|1.4|1.0|0.6|3.0|67.1|
35
+ |decode_asr_asr_model_valid.acc.ave/dev-other|4613|316550|96.1|2.5|1.4|1.0|5.0|74.2|
36
+ |decode_asr_asr_model_valid.acc.ave/test-clean|4837|404031|97.7|1.4|0.9|0.7|2.9|70.5|
37
+ |decode_asr_asr_model_valid.acc.ave/test-other|5120|327248|95.4|2.8|1.8|1.1|5.7|77.1|
38
+
exp/asr_train_asr_raw_en_bpe100_sp/config.yaml ADDED
@@ -0,0 +1,312 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/train_asr.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ drop_last_iter: false
5
+ dry_run: false
6
+ iterator_type: sequence
7
+ valid_iterator_type: null
8
+ output_dir: exp/asr_train_asr_raw_en_bpe100_sp
9
+ ngpu: 1
10
+ seed: 0
11
+ num_workers: 1
12
+ num_att_plot: 3
13
+ dist_backend: nccl
14
+ dist_init_method: env://
15
+ dist_world_size: 2
16
+ dist_rank: 0
17
+ local_rank: 0
18
+ dist_master_addr: localhost
19
+ dist_master_port: 58465
20
+ dist_launcher: null
21
+ multiprocessing_distributed: true
22
+ unused_parameters: true
23
+ sharded_ddp: false
24
+ cudnn_enabled: true
25
+ cudnn_benchmark: false
26
+ cudnn_deterministic: true
27
+ collect_stats: false
28
+ write_collected_feats: false
29
+ max_epoch: 70
30
+ patience: null
31
+ val_scheduler_criterion:
32
+ - valid
33
+ - loss
34
+ early_stopping_criterion:
35
+ - valid
36
+ - loss
37
+ - min
38
+ best_model_criterion:
39
+ - - valid
40
+ - acc
41
+ - max
42
+ keep_nbest_models: 10
43
+ nbest_averaging_interval: 0
44
+ grad_clip: 5.0
45
+ grad_clip_type: 2.0
46
+ grad_noise: false
47
+ accum_grad: 8
48
+ no_forward_run: false
49
+ resume: true
50
+ train_dtype: float32
51
+ use_amp: true
52
+ log_interval: null
53
+ use_matplotlib: true
54
+ use_tensorboard: true
55
+ create_graph_in_tensorboard: false
56
+ use_wandb: false
57
+ wandb_project: null
58
+ wandb_id: null
59
+ wandb_entity: null
60
+ wandb_name: null
61
+ wandb_model_log_interval: -1
62
+ detect_anomaly: false
63
+ pretrain_path: null
64
+ init_param: []
65
+ ignore_init_mismatch: false
66
+ freeze_param: []
67
+ num_iters_per_epoch: null
68
+ batch_size: 20
69
+ valid_batch_size: null
70
+ batch_bins: 10000000
71
+ valid_batch_bins: null
72
+ train_shape_file:
73
+ - exp/asr_stats_raw_en_bpe100_sp/train/speech_shape
74
+ - exp/asr_stats_raw_en_bpe100_sp/train/text_shape.bpe
75
+ valid_shape_file:
76
+ - exp/asr_stats_raw_en_bpe100_sp/valid/speech_shape
77
+ - exp/asr_stats_raw_en_bpe100_sp/valid/text_shape.bpe
78
+ batch_type: numel
79
+ valid_batch_type: null
80
+ fold_length:
81
+ - 80000
82
+ - 150
83
+ sort_in_batch: descending
84
+ shuffle_within_batch: false
85
+ sort_batch: descending
86
+ multiple_iterator: false
87
+ chunk_length: 500
88
+ chunk_shift_ratio: 0.5
89
+ num_cache_chunks: 1024
90
+ chunk_excluded_key_prefixes: []
91
+ train_data_path_and_name_and_type:
92
+ - - dump/raw/train-960_sp/wav.scp
93
+ - speech
94
+ - kaldi_ark
95
+ - - dump/raw/train-960_sp/text
96
+ - text
97
+ - text
98
+ valid_data_path_and_name_and_type:
99
+ - - dump/raw/dev/wav.scp
100
+ - speech
101
+ - kaldi_ark
102
+ - - dump/raw/dev/text
103
+ - text
104
+ - text
105
+ allow_variable_data_keys: false
106
+ max_cache_size: 0.0
107
+ max_cache_fd: 32
108
+ valid_max_cache_size: null
109
+ exclude_weight_decay: false
110
+ exclude_weight_decay_conf: {}
111
+ optim: adam
112
+ optim_conf:
113
+ lr: 0.005
114
+ weight_decay: 1.0e-06
115
+ scheduler: warmuplr
116
+ scheduler_conf:
117
+ warmup_steps: 40000
118
+ token_list:
119
+ - <blank>
120
+ - <unk>
121
+ - ˈ
122
+ - ː
123
+ - ▁
124
+ - ɹ
125
+ - t
126
+ - d
127
+ - ɪ
128
+ - i
129
+ - ˌ
130
+ - ɛ
131
+ - ','
132
+ - s
133
+ - l
134
+ - n
135
+ - k
136
+ - z
137
+ - m
138
+ - ▁s
139
+ - eɪ
140
+ - ʌ
141
+ - aɪ
142
+ - .
143
+ - ɔ
144
+ - æ
145
+ - ɚ
146
+ - oʊ
147
+ - ɑ
148
+ - ▁w
149
+ - ▁h
150
+ - v
151
+ - ▁b
152
+ - ▁m
153
+ - p
154
+ - u
155
+ - ə
156
+ - ▁f
157
+ - ▁k
158
+ - ▁ɐ
159
+ - ▁ðə
160
+ - ən
161
+ - ▁ð
162
+ - ▁ˈ
163
+ - ɪŋ
164
+ - ▁ænd
165
+ - ɜ
166
+ - f
167
+ - ʊ
168
+ - ▁p
169
+ - ɾ
170
+ - ▁ʌv
171
+ - ▁d
172
+ - st
173
+ - ▁tə
174
+ - ɛn
175
+ - ▁l
176
+ - aʊ
177
+ - əl
178
+ - b
179
+ - ▁n
180
+ - ʃ
181
+ - ▁t
182
+ - tʃ
183
+ - ▁ɪn
184
+ - ▁ɡ
185
+ - ðə
186
+ - ɪn
187
+ - ▁ɹ
188
+ - θ
189
+ - w
190
+ - '"'
191
+ - ▁j
192
+ - dʒ
193
+ - æn
194
+ - ▁"
195
+ - ɡ
196
+ - ð
197
+ - o
198
+ - ɐ
199
+ - j
200
+ - ŋ
201
+ - ;
202
+ - '?'
203
+ - '!'
204
+ - h
205
+ - ':'
206
+ - ʒ
207
+ - ʔ
208
+ - r
209
+ - —
210
+ - ɬ
211
+ - x
212
+ - ç
213
+ - ̩
214
+ - ᵻ
215
+ - e
216
+ - a
217
+ - ̃
218
+ - <sos/eos>
219
+ init: null
220
+ input_size: null
221
+ ctc_conf:
222
+ dropout_rate: 0.0
223
+ ctc_type: builtin
224
+ reduce: true
225
+ ignore_nan_grad: null
226
+ zero_infinity: true
227
+ joint_net_conf: null
228
+ use_preprocessor: true
229
+ token_type: bpe
230
+ bpemodel: data/en_token_list/bpe_unigram100/bpe.model
231
+ non_linguistic_symbols: null
232
+ cleaner: null
233
+ g2p: null
234
+ speech_volume_normalize: null
235
+ rir_scp: null
236
+ rir_apply_prob: 1.0
237
+ noise_scp: null
238
+ noise_apply_prob: 1.0
239
+ noise_db_range: '13_15'
240
+ short_noise_thres: 0.5
241
+ aux_ctc_tasks: []
242
+ frontend: default
243
+ frontend_conf:
244
+ n_fft: 512
245
+ hop_length: 160
246
+ fs: 16k
247
+ specaug: specaug
248
+ specaug_conf:
249
+ apply_time_warp: true
250
+ time_warp_window: 5
251
+ time_warp_mode: bicubic
252
+ apply_freq_mask: true
253
+ freq_mask_width_range:
254
+ - 0
255
+ - 27
256
+ num_freq_mask: 2
257
+ apply_time_mask: true
258
+ time_mask_width_ratio_range:
259
+ - 0.0
260
+ - 0.05
261
+ num_time_mask: 10
262
+ normalize: global_mvn
263
+ normalize_conf:
264
+ stats_file: exp/asr_stats_raw_en_bpe100_sp/train/feats_stats.npz
265
+ model: espnet
266
+ model_conf:
267
+ ctc_weight: 0.6
268
+ lsm_weight: 0.1
269
+ length_normalized_loss: false
270
+ preencoder: null
271
+ preencoder_conf: {}
272
+ encoder: e_branchformer
273
+ encoder_conf:
274
+ output_size: 512
275
+ attention_heads: 8
276
+ attention_layer_type: rel_selfattn
277
+ pos_enc_layer_type: rel_pos
278
+ rel_pos_type: latest
279
+ cgmlp_linear_units: 3072
280
+ cgmlp_conv_kernel: 31
281
+ use_linear_after_conv: false
282
+ gate_activation: identity
283
+ num_blocks: 17
284
+ dropout_rate: 0.1
285
+ positional_dropout_rate: 0.1
286
+ attention_dropout_rate: 0.1
287
+ input_layer: conv2d
288
+ layer_drop_rate: 0.1
289
+ linear_units: 1024
290
+ positionwise_layer_type: linear
291
+ macaron_ffn: true
292
+ use_ffn: true
293
+ merge_conv_kernel: 31
294
+ postencoder: null
295
+ postencoder_conf: {}
296
+ decoder: transformer
297
+ decoder_conf:
298
+ attention_heads: 8
299
+ linear_units: 2048
300
+ num_blocks: 6
301
+ dropout_rate: 0.1
302
+ positional_dropout_rate: 0.1
303
+ self_attention_dropout_rate: 0.1
304
+ src_attention_dropout_rate: 0.1
305
+ layer_drop_rate: 0.2
306
+ preprocessor: default
307
+ preprocessor_conf: {}
308
+ required:
309
+ - output_dir
310
+ - token_list
311
+ version: '202308'
312
+ distributed: true
exp/asr_train_asr_raw_en_bpe100_sp/images/acc.png ADDED
exp/asr_train_asr_raw_en_bpe100_sp/images/backward_time.png ADDED
exp/asr_train_asr_raw_en_bpe100_sp/images/cer.png ADDED
exp/asr_train_asr_raw_en_bpe100_sp/images/cer_ctc.png ADDED
exp/asr_train_asr_raw_en_bpe100_sp/images/clip.png ADDED
exp/asr_train_asr_raw_en_bpe100_sp/images/forward_time.png ADDED
exp/asr_train_asr_raw_en_bpe100_sp/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_raw_en_bpe100_sp/images/grad_norm.png ADDED
exp/asr_train_asr_raw_en_bpe100_sp/images/iter_time.png ADDED
exp/asr_train_asr_raw_en_bpe100_sp/images/loss.png ADDED
exp/asr_train_asr_raw_en_bpe100_sp/images/loss_att.png ADDED
exp/asr_train_asr_raw_en_bpe100_sp/images/loss_ctc.png ADDED
exp/asr_train_asr_raw_en_bpe100_sp/images/loss_scale.png ADDED
exp/asr_train_asr_raw_en_bpe100_sp/images/optim0_lr0.png ADDED
exp/asr_train_asr_raw_en_bpe100_sp/images/optim_step_time.png ADDED
exp/asr_train_asr_raw_en_bpe100_sp/images/train_time.png ADDED
exp/asr_train_asr_raw_en_bpe100_sp/images/wer.png ADDED
exp/asr_train_asr_raw_en_bpe100_sp/valid.acc.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a0364752b9eea3a0bfc0533d43d8f4a2dca0b21b17053fe83c2b6a0f1123645
3
+ size 565961992
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202308'
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_raw_en_bpe100_sp/valid.acc.ave_10best.pth
4
+ python: 3.10.8 (main, Nov 14 2022, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-3)]
5
+ timestamp: 1697281166.784828
6
+ torch: 2.0.1+cu118
7
+ yaml_files:
8
+ asr_train_config: exp/asr_train_asr_raw_en_bpe100_sp/config.yaml