Emrys365 commited on
Commit
0fc2c3b
·
1 Parent(s): bbcfb8c

Update model

Browse files
Files changed (45) hide show
  1. README.md +383 -3
  2. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/94epoch.pth +3 -0
  3. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/config.yaml +241 -0
  4. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/enhanced_test_16k/RESULTS.md +23 -0
  5. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/enhanced_test_48k/RESULTS.md +18 -0
  6. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/backward_time.png +0 -0
  7. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/clip.png +0 -0
  8. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/forward_time.png +0 -0
  9. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/gpu_max_cached_mem_GB.png +0 -0
  10. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/grad_norm.png +0 -0
  11. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/iter_time.png +0 -0
  12. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_16k.png +0 -0
  13. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_16k_r.png +0 -0
  14. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_24k.png +0 -0
  15. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_48k.png +0 -0
  16. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_8k.png +0 -0
  17. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_8k_r.png +0 -0
  18. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_2ch_16k.png +0 -0
  19. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_2ch_16k_r.png +0 -0
  20. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_2ch_8k.png +0 -0
  21. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_2ch_8k_r.png +0 -0
  22. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_5ch_16k.png +0 -0
  23. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_5ch_8k.png +0 -0
  24. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_8ch_16k_r.png +0 -0
  25. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_8ch_8k_r.png +0 -0
  26. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/loss.png +0 -0
  27. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/loss_scale.png +0 -0
  28. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/optim0_lr0.png +0 -0
  29. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/optim_step_time.png +0 -0
  30. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_1ch_16k.png +0 -0
  31. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_1ch_16k_r.png +0 -0
  32. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_1ch_24k.png +0 -0
  33. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_1ch_48k.png +0 -0
  34. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_1ch_8k.png +0 -0
  35. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_1ch_8k_r.png +0 -0
  36. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_2ch_16k.png +0 -0
  37. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_2ch_16k_r.png +0 -0
  38. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_2ch_8k.png +0 -0
  39. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_2ch_8k_r.png +0 -0
  40. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_5ch_16k.png +0 -0
  41. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_5ch_8k.png +0 -0
  42. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_8ch_16k_r.png +0 -0
  43. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_8ch_8k_r.png +0 -0
  44. exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/train_time.png +0 -0
  45. meta.yaml +8 -0
README.md CHANGED
@@ -1,3 +1,383 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - audio-to-audio
6
+ language: en
7
+ datasets:
8
+ - universal_se
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ENH model
13
+
14
+ ### `wyz/vctk_dns2020_bsrnn_medium_noncausal`
15
+
16
+ This model was trained by Emrys365 using universal_se recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ To use the model in the Python interface, you could use the following code:
24
+
25
+ ```python
26
+ import soundfile as sf
27
+ from espnet2.bin.enh_inference import SeparateSpeech
28
+
29
+ # For model downloading + loading
30
+ model = SeparateSpeech.from_pretrained(
31
+ model_tag=wyz/vctk_dns2020_bsrnn_medium_noncausal,
32
+ normalize_output_wav=True,
33
+ device=cuda,
34
+ )
35
+ # For loading a downloaded model
36
+ # model = SeparateSpeech(
37
+ # train_config=exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/config.yaml,
38
+ # model_file=exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/xxxx.pth,
39
+ # normalize_output_wav=True,
40
+ # device=cuda,
41
+ # )
42
+
43
+ audio, fs = sf.read(/path/to/noisy/utt1.flac)
44
+ enhanced = model(audio[None, :], fs=fs)[0]
45
+ ```
46
+
47
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
48
+ # RESULTS
49
+ ## Environments
50
+ - date: `Tue Feb 27 22:36:44 EST 2024`
51
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
52
+ - espnet version: `espnet 202304`
53
+ - pytorch version: `pytorch 2.0.1+cu118`
54
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
55
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
56
+
57
+
58
+ ## enhanced_test_16k
59
+
60
+
61
+ |dataset|PESQ_WB|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
62
+ |---|---|---|---|---|---|---|---|---|---|---|
63
+ |chime4_et05_real_isolated_6ch_track|1.19|53.80|-2.46|-2.46|0.00|-31.04|3.07|3.40|3.91|3.72|
64
+ |chime4_et05_simu_isolated_6ch_track|1.58|84.56|9.02|9.02|0.00|2.74|2.94|3.25|3.92|3.33|
65
+ |dns20_tt_synthetic_no_reverb|3.28|97.86|19.82|19.82|0.00|19.74|3.35|3.59|4.13|4.03|
66
+ |reverb_et_real_8ch_multich|1.74|82.86|10.32|10.32|0.00|6.55|2.73|3.20|3.53|3.51|
67
+ |reverb_et_simu_8ch_multich|1.62|85.32|9.20|9.20|0.00|-10.54|2.78|3.27|3.49|3.60|
68
+ |whamr_tt_mix_single_reverb_max_16k|1.56|85.42|8.02|8.02|0.00|2.64|2.98|3.34|3.83|3.68|
69
+
70
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
71
+ # RESULTS
72
+ ## Environments
73
+ - date: `Sat Dec 30 18:27:20 EST 2023`
74
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
75
+ - espnet version: `espnet 202304`
76
+ - pytorch version: `pytorch 2.0.1+cu118`
77
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
78
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
79
+
80
+
81
+ ## enhanced_test_48k
82
+
83
+
84
+ |dataset|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
85
+ |---|---|---|---|---|---|---|---|---|---|
86
+ |vctk_noisy_tt_2spk|95.75|19.55|19.55|0.00|18.72|3.16|3.47|3.97|3.55|
87
+
88
+ ## ENH config
89
+
90
+ <details><summary>expand</summary>
91
+
92
+ ```
93
+ config: conf/tuning/train_enh_bsrnn_medium_noncausal.yaml
94
+ print_config: false
95
+ log_level: INFO
96
+ dry_run: false
97
+ iterator_type: chunk
98
+ output_dir: exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw
99
+ ngpu: 1
100
+ seed: 0
101
+ num_workers: 4
102
+ num_att_plot: 3
103
+ dist_backend: nccl
104
+ dist_init_method: env://
105
+ dist_world_size: null
106
+ dist_rank: null
107
+ local_rank: 0
108
+ dist_master_addr: null
109
+ dist_master_port: null
110
+ dist_launcher: null
111
+ multiprocessing_distributed: false
112
+ unused_parameters: true
113
+ sharded_ddp: false
114
+ cudnn_enabled: true
115
+ cudnn_benchmark: false
116
+ cudnn_deterministic: true
117
+ collect_stats: false
118
+ write_collected_feats: false
119
+ max_epoch: 100
120
+ patience: 40
121
+ val_scheduler_criterion:
122
+ - valid
123
+ - loss
124
+ early_stopping_criterion:
125
+ - valid
126
+ - loss
127
+ - min
128
+ best_model_criterion:
129
+ - - valid
130
+ - loss
131
+ - min
132
+ keep_nbest_models: 1
133
+ nbest_averaging_interval: 0
134
+ grad_clip: 5.0
135
+ grad_clip_type: 2.0
136
+ grad_noise: false
137
+ accum_grad: 1
138
+ no_forward_run: false
139
+ resume: true
140
+ save_interval: 1000
141
+ train_dtype: float32
142
+ use_amp: false
143
+ log_interval: null
144
+ use_matplotlib: true
145
+ use_tensorboard: true
146
+ create_graph_in_tensorboard: false
147
+ use_wandb: false
148
+ wandb_project: null
149
+ wandb_id: null
150
+ wandb_entity: null
151
+ wandb_name: null
152
+ wandb_model_log_interval: -1
153
+ detect_anomaly: false
154
+ pretrain_path: null
155
+ init_param: []
156
+ ignore_init_mismatch: false
157
+ freeze_param: []
158
+ num_iters_per_epoch: 8000
159
+ num_iters_valid: null
160
+ batch_size: 4
161
+ valid_batch_size: null
162
+ batch_bins: 1000000
163
+ valid_batch_bins: null
164
+ train_shape_file:
165
+ - exp_vctk_dns20/enh_stats_16k/train/speech_mix_shape
166
+ - exp_vctk_dns20/enh_stats_16k/train/speech_ref1_shape
167
+ - exp_vctk_dns20/enh_stats_16k/train/dereverb_ref1_shape
168
+ valid_shape_file:
169
+ - exp_vctk_dns20/enh_stats_16k/valid/speech_mix_shape
170
+ - exp_vctk_dns20/enh_stats_16k/valid/speech_ref1_shape
171
+ - exp_vctk_dns20/enh_stats_16k/valid/dereverb_ref1_shape
172
+ batch_type: folded
173
+ valid_batch_type: null
174
+ fold_length:
175
+ - 80000
176
+ - 80000
177
+ - 80000
178
+ sort_in_batch: descending
179
+ sort_batch: descending
180
+ multiple_iterator: false
181
+ chunk_length: 32000
182
+ chunk_shift_ratio: 0.5
183
+ num_cache_chunks: 1024
184
+ chunk_excluded_key_prefixes: []
185
+ chunk_discard_short_samples: false
186
+ train_data_path_and_name_and_type:
187
+ - - dump/raw/train_vctk_noisy_dns20/wav.scp
188
+ - speech_mix
189
+ - sound
190
+ - - dump/raw/train_vctk_noisy_dns20/spk1.scp
191
+ - speech_ref1
192
+ - sound
193
+ - - dump/raw/train_vctk_noisy_dns20/dereverb1.scp
194
+ - dereverb_ref1
195
+ - sound
196
+ - - dump/raw/train_vctk_noisy_dns20/utt2category
197
+ - category
198
+ - text
199
+ - - dump/raw/train_vctk_noisy_dns20/utt2fs
200
+ - fs
201
+ - text_int
202
+ valid_data_path_and_name_and_type:
203
+ - - dump/raw/valid_vctk_noisy_dns20/wav.scp
204
+ - speech_mix
205
+ - sound
206
+ - - dump/raw/valid_vctk_noisy_dns20/spk1.scp
207
+ - speech_ref1
208
+ - sound
209
+ - - dump/raw/valid_vctk_noisy_dns20/dereverb1.scp
210
+ - dereverb_ref1
211
+ - sound
212
+ - - dump/raw/valid_vctk_noisy_dns20/utt2category
213
+ - category
214
+ - text
215
+ - - dump/raw/valid_vctk_noisy_dns20/utt2fs
216
+ - fs
217
+ - text_int
218
+ allow_variable_data_keys: false
219
+ max_cache_size: 0.0
220
+ max_cache_fd: 32
221
+ allow_multi_rates: true
222
+ valid_max_cache_size: null
223
+ exclude_weight_decay: false
224
+ exclude_weight_decay_conf: {}
225
+ optim: adam
226
+ optim_conf:
227
+ lr: 0.001
228
+ eps: 1.0e-08
229
+ weight_decay: 1.0e-05
230
+ scheduler: steplr
231
+ scheduler_conf:
232
+ step_size: 2
233
+ gamma: 0.99
234
+ init: null
235
+ model_conf:
236
+ normalize_variance_per_ch: true
237
+ categories:
238
+ - 1ch_8k
239
+ - 1ch_8k_r
240
+ - 1ch_16k_r
241
+ - 1ch_48k
242
+ - 1ch_24k
243
+ - 1ch_16k
244
+ - 2ch_8k
245
+ - 2ch_8k_r
246
+ - 2ch_16k
247
+ - 2ch_16k_r
248
+ - 5ch_8k
249
+ - 5ch_16k
250
+ - 8ch_8k_r
251
+ - 8ch_16k_r
252
+ criterions:
253
+ - name: mr_l1_tfd
254
+ conf:
255
+ window_sz:
256
+ - 256
257
+ - 512
258
+ - 768
259
+ - 1024
260
+ hop_sz: null
261
+ eps: 1.0e-08
262
+ time_domain_weight: 0.5
263
+ normalize_variance: true
264
+ wrapper: fixed_order
265
+ wrapper_conf:
266
+ weight: 1.0
267
+ - name: si_snr
268
+ conf:
269
+ eps: 1.0e-07
270
+ wrapper: fixed_order
271
+ wrapper_conf:
272
+ weight: 0.0
273
+ speech_volume_normalize: null
274
+ rir_scp: null
275
+ rir_apply_prob: 1.0
276
+ noise_scp: null
277
+ noise_apply_prob: 1.0
278
+ noise_db_range: '13_15'
279
+ short_noise_thres: 0.5
280
+ use_reverberant_ref: false
281
+ num_spk: 1
282
+ num_noise_type: 1
283
+ sample_rate: 8000
284
+ force_single_channel: true
285
+ channel_reordering: true
286
+ categories:
287
+ - 1ch_8k
288
+ - 1ch_8k_r
289
+ - 1ch_16k_r
290
+ - 1ch_48k
291
+ - 1ch_24k
292
+ - 1ch_16k
293
+ - 2ch_8k
294
+ - 2ch_8k_r
295
+ - 2ch_16k
296
+ - 2ch_16k_r
297
+ - 5ch_8k
298
+ - 5ch_16k
299
+ - 8ch_8k_r
300
+ - 8ch_16k_r
301
+ speech_segment: null
302
+ avoid_allzero_segment: true
303
+ flexible_numspk: false
304
+ dynamic_mixing: false
305
+ utt2spk: null
306
+ dynamic_mixing_gain_db: 0.0
307
+ encoder: stft
308
+ encoder_conf:
309
+ n_fft: 960
310
+ hop_length: 480
311
+ use_builtin_complex: true
312
+ default_fs: 48000
313
+ separator: bsrnn
314
+ separator_conf:
315
+ num_spk: 1
316
+ num_channels: 128
317
+ num_layers: 6
318
+ target_fs: 48000
319
+ ref_channel: 0
320
+ causal: false
321
+ decoder: stft
322
+ decoder_conf:
323
+ n_fft: 960
324
+ hop_length: 480
325
+ default_fs: 48000
326
+ mask_module: multi_mask
327
+ mask_module_conf: {}
328
+ preprocessor: enh
329
+ preprocessor_conf: {}
330
+ required:
331
+ - output_dir
332
+ version: '202304'
333
+ distributed: false
334
+ ```
335
+
336
+ </details>
337
+
338
+
339
+
340
+ ### Citing ESPnet
341
+
342
+ ```BibTex
343
+ @inproceedings{watanabe2018espnet,
344
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
345
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
346
+ year={2018},
347
+ booktitle={Proceedings of Interspeech},
348
+ pages={2207--2211},
349
+ doi={10.21437/Interspeech.2018-1456},
350
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
351
+ }
352
+
353
+
354
+ @inproceedings{ESPnet-SE,
355
+ author = {Chenda Li and Jing Shi and Wangyou Zhang and Aswin Shanmugam Subramanian and Xuankai Chang and
356
+ Naoyuki Kamo and Moto Hira and Tomoki Hayashi and Christoph B{"{o}}ddeker and Zhuo Chen and Shinji Watanabe},
357
+ title = {ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for {ASR} Integration},
358
+ booktitle = {{IEEE} Spoken Language Technology Workshop, {SLT} 2021, Shenzhen, China, January 19-22, 2021},
359
+ pages = {785--792},
360
+ publisher = {{IEEE}},
361
+ year = {2021},
362
+ url = {https://doi.org/10.1109/SLT48900.2021.9383615},
363
+ doi = {10.1109/SLT48900.2021.9383615},
364
+ timestamp = {Mon, 12 Apr 2021 17:08:59 +0200},
365
+ biburl = {https://dblp.org/rec/conf/slt/Li0ZSCKHHBC021.bib},
366
+ bibsource = {dblp computer science bibliography, https://dblp.org}
367
+ }
368
+
369
+
370
+ ```
371
+
372
+ or arXiv:
373
+
374
+ ```bibtex
375
+ @misc{watanabe2018espnet,
376
+ title={ESPnet: End-to-End Speech Processing Toolkit},
377
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
378
+ year={2018},
379
+ eprint={1804.00015},
380
+ archivePrefix={arXiv},
381
+ primaryClass={cs.CL}
382
+ }
383
+ ```
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/94epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c132fdcb37ab1de3a68e5b47c43c4b9d5a0f65c1a5a2a019cbaf16a839bc4cc8
3
+ size 67803273
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/config.yaml ADDED
@@ -0,0 +1,241 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_enh_bsrnn_medium_noncausal.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: chunk
6
+ output_dir: exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 4
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: true
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 100
28
+ patience: 40
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - loss
39
+ - min
40
+ keep_nbest_models: 1
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 1
46
+ no_forward_run: false
47
+ resume: true
48
+ save_interval: 1000
49
+ train_dtype: float32
50
+ use_amp: false
51
+ log_interval: null
52
+ use_matplotlib: true
53
+ use_tensorboard: true
54
+ create_graph_in_tensorboard: false
55
+ use_wandb: false
56
+ wandb_project: null
57
+ wandb_id: null
58
+ wandb_entity: null
59
+ wandb_name: null
60
+ wandb_model_log_interval: -1
61
+ detect_anomaly: false
62
+ pretrain_path: null
63
+ init_param: []
64
+ ignore_init_mismatch: false
65
+ freeze_param: []
66
+ num_iters_per_epoch: 8000
67
+ num_iters_valid: null
68
+ batch_size: 4
69
+ valid_batch_size: null
70
+ batch_bins: 1000000
71
+ valid_batch_bins: null
72
+ train_shape_file:
73
+ - exp_vctk_dns20/enh_stats_16k/train/speech_mix_shape
74
+ - exp_vctk_dns20/enh_stats_16k/train/speech_ref1_shape
75
+ - exp_vctk_dns20/enh_stats_16k/train/dereverb_ref1_shape
76
+ valid_shape_file:
77
+ - exp_vctk_dns20/enh_stats_16k/valid/speech_mix_shape
78
+ - exp_vctk_dns20/enh_stats_16k/valid/speech_ref1_shape
79
+ - exp_vctk_dns20/enh_stats_16k/valid/dereverb_ref1_shape
80
+ batch_type: folded
81
+ valid_batch_type: null
82
+ fold_length:
83
+ - 80000
84
+ - 80000
85
+ - 80000
86
+ sort_in_batch: descending
87
+ sort_batch: descending
88
+ multiple_iterator: false
89
+ chunk_length: 32000
90
+ chunk_shift_ratio: 0.5
91
+ num_cache_chunks: 1024
92
+ chunk_excluded_key_prefixes: []
93
+ chunk_discard_short_samples: false
94
+ train_data_path_and_name_and_type:
95
+ - - dump/raw/train_vctk_noisy_dns20/wav.scp
96
+ - speech_mix
97
+ - sound
98
+ - - dump/raw/train_vctk_noisy_dns20/spk1.scp
99
+ - speech_ref1
100
+ - sound
101
+ - - dump/raw/train_vctk_noisy_dns20/dereverb1.scp
102
+ - dereverb_ref1
103
+ - sound
104
+ - - dump/raw/train_vctk_noisy_dns20/utt2category
105
+ - category
106
+ - text
107
+ - - dump/raw/train_vctk_noisy_dns20/utt2fs
108
+ - fs
109
+ - text_int
110
+ valid_data_path_and_name_and_type:
111
+ - - dump/raw/valid_vctk_noisy_dns20/wav.scp
112
+ - speech_mix
113
+ - sound
114
+ - - dump/raw/valid_vctk_noisy_dns20/spk1.scp
115
+ - speech_ref1
116
+ - sound
117
+ - - dump/raw/valid_vctk_noisy_dns20/dereverb1.scp
118
+ - dereverb_ref1
119
+ - sound
120
+ - - dump/raw/valid_vctk_noisy_dns20/utt2category
121
+ - category
122
+ - text
123
+ - - dump/raw/valid_vctk_noisy_dns20/utt2fs
124
+ - fs
125
+ - text_int
126
+ allow_variable_data_keys: false
127
+ max_cache_size: 0.0
128
+ max_cache_fd: 32
129
+ allow_multi_rates: true
130
+ valid_max_cache_size: null
131
+ exclude_weight_decay: false
132
+ exclude_weight_decay_conf: {}
133
+ optim: adam
134
+ optim_conf:
135
+ lr: 0.001
136
+ eps: 1.0e-08
137
+ weight_decay: 1.0e-05
138
+ scheduler: steplr
139
+ scheduler_conf:
140
+ step_size: 2
141
+ gamma: 0.99
142
+ init: null
143
+ model_conf:
144
+ normalize_variance_per_ch: true
145
+ categories:
146
+ - 1ch_8k
147
+ - 1ch_8k_r
148
+ - 1ch_16k_r
149
+ - 1ch_48k
150
+ - 1ch_24k
151
+ - 1ch_16k
152
+ - 2ch_8k
153
+ - 2ch_8k_r
154
+ - 2ch_16k
155
+ - 2ch_16k_r
156
+ - 5ch_8k
157
+ - 5ch_16k
158
+ - 8ch_8k_r
159
+ - 8ch_16k_r
160
+ criterions:
161
+ - name: mr_l1_tfd
162
+ conf:
163
+ window_sz:
164
+ - 256
165
+ - 512
166
+ - 768
167
+ - 1024
168
+ hop_sz: null
169
+ eps: 1.0e-08
170
+ time_domain_weight: 0.5
171
+ normalize_variance: true
172
+ wrapper: fixed_order
173
+ wrapper_conf:
174
+ weight: 1.0
175
+ - name: si_snr
176
+ conf:
177
+ eps: 1.0e-07
178
+ wrapper: fixed_order
179
+ wrapper_conf:
180
+ weight: 0.0
181
+ speech_volume_normalize: null
182
+ rir_scp: null
183
+ rir_apply_prob: 1.0
184
+ noise_scp: null
185
+ noise_apply_prob: 1.0
186
+ noise_db_range: '13_15'
187
+ short_noise_thres: 0.5
188
+ use_reverberant_ref: false
189
+ num_spk: 1
190
+ num_noise_type: 1
191
+ sample_rate: 8000
192
+ force_single_channel: true
193
+ channel_reordering: true
194
+ categories:
195
+ - 1ch_8k
196
+ - 1ch_8k_r
197
+ - 1ch_16k_r
198
+ - 1ch_48k
199
+ - 1ch_24k
200
+ - 1ch_16k
201
+ - 2ch_8k
202
+ - 2ch_8k_r
203
+ - 2ch_16k
204
+ - 2ch_16k_r
205
+ - 5ch_8k
206
+ - 5ch_16k
207
+ - 8ch_8k_r
208
+ - 8ch_16k_r
209
+ speech_segment: null
210
+ avoid_allzero_segment: true
211
+ flexible_numspk: false
212
+ dynamic_mixing: false
213
+ utt2spk: null
214
+ dynamic_mixing_gain_db: 0.0
215
+ encoder: stft
216
+ encoder_conf:
217
+ n_fft: 960
218
+ hop_length: 480
219
+ use_builtin_complex: true
220
+ default_fs: 48000
221
+ separator: bsrnn
222
+ separator_conf:
223
+ num_spk: 1
224
+ num_channels: 128
225
+ num_layers: 6
226
+ target_fs: 48000
227
+ ref_channel: 0
228
+ causal: false
229
+ decoder: stft
230
+ decoder_conf:
231
+ n_fft: 960
232
+ hop_length: 480
233
+ default_fs: 48000
234
+ mask_module: multi_mask
235
+ mask_module_conf: {}
236
+ preprocessor: enh
237
+ preprocessor_conf: {}
238
+ required:
239
+ - output_dir
240
+ version: '202304'
241
+ distributed: false
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/enhanced_test_16k/RESULTS.md ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Tue Feb 27 22:36:44 EST 2024`
5
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202304`
7
+ - pytorch version: `pytorch 2.0.1+cu118`
8
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
9
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
10
+
11
+
12
+ ## enhanced_test_16k
13
+
14
+
15
+ |dataset|PESQ_WB|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
16
+ |---|---|---|---|---|---|---|---|---|---|---|
17
+ |chime4_et05_real_isolated_6ch_track|1.19|53.80|-2.46|-2.46|0.00|-31.04|3.07|3.40|3.91|3.72|
18
+ |chime4_et05_simu_isolated_6ch_track|1.58|84.56|9.02|9.02|0.00|2.74|2.94|3.25|3.92|3.33|
19
+ |dns20_tt_synthetic_no_reverb|3.28|97.86|19.82|19.82|0.00|19.74|3.35|3.59|4.13|4.03|
20
+ |reverb_et_real_8ch_multich|1.74|82.86|10.32|10.32|0.00|6.55|2.73|3.20|3.53|3.51|
21
+ |reverb_et_simu_8ch_multich|1.62|85.32|9.20|9.20|0.00|-10.54|2.78|3.27|3.49|3.60|
22
+ |whamr_tt_mix_single_reverb_max_16k|1.56|85.42|8.02|8.02|0.00|2.64|2.98|3.34|3.83|3.68|
23
+
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/enhanced_test_48k/RESULTS.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Sat Dec 30 18:27:20 EST 2023`
5
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202304`
7
+ - pytorch version: `pytorch 2.0.1+cu118`
8
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
9
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
10
+
11
+
12
+ ## enhanced_test_48k
13
+
14
+
15
+ |dataset|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
16
+ |---|---|---|---|---|---|---|---|---|---|
17
+ |vctk_noisy_tt_2spk|95.75|19.55|19.55|0.00|18.72|3.16|3.47|3.97|3.55|
18
+
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/backward_time.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/clip.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/forward_time.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/gpu_max_cached_mem_GB.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/grad_norm.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/iter_time.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_16k.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_16k_r.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_24k.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_48k.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_8k.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_8k_r.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_2ch_16k.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_2ch_16k_r.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_2ch_8k.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_2ch_8k_r.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_5ch_16k.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_5ch_8k.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_8ch_16k_r.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/l1_timedomain+magspec_loss_8ch_8k_r.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/loss.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/loss_scale.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/optim0_lr0.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/optim_step_time.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_1ch_16k.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_1ch_16k_r.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_1ch_24k.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_1ch_48k.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_1ch_8k.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_1ch_8k_r.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_2ch_16k.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_2ch_16k_r.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_2ch_8k.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_2ch_8k_r.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_5ch_16k.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_5ch_8k.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_8ch_16k_r.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/si_snr_loss_8ch_8k_r.png ADDED
exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/images/train_time.png ADDED
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202304'
2
+ files:
3
+ model_file: exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/94epoch.pth
4
+ python: "3.8.16 (default, Mar 2 2023, 03:21:46) \n[GCC 11.2.0]"
5
+ timestamp: 1722936552.419344
6
+ torch: 2.0.1+cu118
7
+ yaml_files:
8
+ train_config: exp_vctk_dns20/enh_train_enh_bsrnn_medium_noncausal_raw/config.yaml