wyz
/

vctk_bsrnn_large_double_noncausal

Model card Files Files and versions Community

wyz commited on Aug 7, 2024

Commit

756b9e3

·

verified ·

1 Parent(s): 84fdcc4

Update README.md

Files changed (1) hide show

README.md +55 -8

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ tags:
 - audio-to-audio
 language: en
 datasets:
-- universal_se
 license: cc-by-4.0
 ---
@@ -13,22 +13,69 @@ license: cc-by-4.0
 ### `wyz/vctk_bsrnn_large_double_noncausal`
-This model was trained by Emrys365 using universal_se recipe in [espnet](https://github.com/espnet/espnet/).
 ### Demo: How to use in ESPnet2
 Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
 if you haven't done that already.
-```bash
-cd espnet
-git checkout 443028662106472c60fe8bd892cb277e5b488651
-pip install -e .
-cd egs2/universal_se/enh1
-./run.sh --skip_data_prep false --skip_train true --download_model wyz/vctk_bsrnn_large_double_noncausal
 ```
 ## ENH config

 - audio-to-audio
 language: en
 datasets:
+  - VCTK_DEMAND
 license: cc-by-4.0
 ---
 ### `wyz/vctk_bsrnn_large_double_noncausal`
+This model was trained by wyz based on the universal_se_v1 recipe in [espnet](https://github.com/espnet/espnet/).
 ### Demo: How to use in ESPnet2
 Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
 if you haven't done that already.
+To use the model in the Python interface, you could use the following code:
+```python
+import soundfile as sf
+from espnet2.bin.enh_inference import SeparateSpeech
+# For model downloading + loading
+model = SeparateSpeech.from_pretrained(
+    model_tag="wyz/vctk_bsrnn_large_double_noncausal",
+    normalize_output_wav=True,
+    device="cuda",
+)
+# For loading a downloaded model
+# model = SeparateSpeech(
+#     train_config="exp_vctk/enh_train_enh_bsrnn_large_double_noncausal_raw/config.yaml",
+#     model_file="exp_vctk/enh_train_enh_bsrnn_large_double_noncausal_raw/xxxx.pth",
+#     normalize_output_wav=True,
+#     device="cuda",
+# )
+audio, fs = sf.read("/path/to/noisy/utt1.flac")
+enhanced = model(audio[None, :], fs=fs)[0]
 ```
+<!-- Generated by ./scripts/utils/show_enh_score.sh -->
+# RESULTS
+## Environments
+- date: `Wed Feb 28 01:32:14 EST 2024`
+- python version: `3.8.16 (default, Mar  2 2023, 03:21:46)  [GCC 11.2.0]`
+- espnet version: `espnet 202304`
+- pytorch version: `pytorch 2.0.1+cu118`
+- Git hash: `443028662106472c60fe8bd892cb277e5b488651`
+  - Commit date: `Thu May 11 03:32:59 2023 +0000`
+## enhanced_test_16k
+|dataset|PESQ_WB|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
+|---|---|---|---|---|---|---|---|---|---|---|
+|chime4_et05_real_isolated_6ch_track|1.13|51.08|-3.04|-3.04|0.00|-31.26|2.86|3.21|3.77|3.48|
+|chime4_et05_simu_isolated_6ch_track|1.28|75.86|6.61|6.61|0.00|0.60|2.75|3.07|3.80|3.17|
+|dns20_tt_synthetic_no_reverb|2.56|95.64|15.96|15.96|0.00|15.27|3.26|3.56|4.01|3.91|
+|reverb_et_real_8ch_multich|1.08|45.15|1.67|1.67|0.00|-2.75|2.30|2.62|3.57|3.01|
+|reverb_et_simu_8ch_multich|1.66|77.85|8.05|8.05|0.00|-12.78|2.87|3.18|3.82|3.51|
+|whamr_tt_mix_single_reverb_max_16k|1.36|78.73|5.96|5.96|0.00|1.37|2.77|3.09|3.83|3.45|
+## enhanced_test_48k
+|dataset|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
+|---|---|---|---|---|---|---|---|---|---|
+|vctk_noisy_tt_2spk|95.31|20.21|20.21|0.00|19.23|3.15|3.45|3.99|3.52|
 ## ENH config