metadata
language:
- en
thumbnail: null
tags:
- audio-to-audio
- Speech Enhancement
- Voicebank-DEMAND
- UNIVERSE
- UNIVERSE++
- Diffusion
- pytorch
- open-universe
license: apache-2.0
datasets:
- Voicebank-DEMAND
metrics:
- SI-SNR
- PESQ
- SIG
- BAK
- OVRL
model-index:
- name: universe++
results:
- task:
name: Speech Enhancement
type: speech-enhancement
dataset:
name: DEMAND
type: demand
split: test-set
args:
language: en
metrics:
- name: DNSMOS SIG
type: sig
value: '3.493'
- name: DNSMOS BAK
type: bak
value: '4.042'
- name: DNSMOS OVRL
type: ovrl
value: '3.205'
- name: PESQ
type: pesq
value: 3.017
- name: SI-SDR
type: si-sdr
value: 18.629
open-universe: Generative Speech Enhancement with Score-based Diffusion and Adversarial Training
This repository contains the configurations and weights for the UNIVERSE++ and UNIVERSE models implemented in open-universe.
The models were trained on the Voicebank-DEMAND dataset at 16 kHz.
The performance on the test split of Voicebank-DEMAND is given in the following table.
model | si-sdr | pesq-wb | stoi-ext | lsd | lps | OVRL | SIG | BAK |
---|---|---|---|---|---|---|---|---|
UNIVERSE++ | 18.624 | 3.017 | 0.864 | 4.867 | 0.937 | 3.200 | 3.489 | 4.040 |
UNIVERSE | 17.600 | 2.830 | 0.844 | 6.318 | 0.920 | 3.157 | 3.457 | 4.013 |
Usage
Start by installing open-universe
.
We use conda to simplify the installation.
git clone https://github.com/line/open-universe.git
cd open-universe
conda env create -f environment.yaml
conda activate open-universe
python -m pip install .
Then the models can be used as follows.
# UNIVERSE++ (default model)
python -m open_universe.bin.enhance <input/folder> <output/folder> \
--model line-corporation/open-universe:plusplus
# UNIVERSE
python -m open_universe.bin.enhance <input/folder> <output/folder> \
--model line-corporation/open-universe:original
Referencing open-universe and UNIVERSE++
If you use these models in your work, please consider citing the following paper.
@inproceedings{universepp,
authors={Scheibler, Robin and Fujita, Yusuke and Shirahata, Yuma and Komatsu, Tatsuya},
title={Universal Score-based Speech Enhancement with High Content Preservation},
booktitle={Proc. Interspeech 2024},
month=sep,
year=2024
}
Referencing UNIVERSE
@misc{universe,
authors={Serr\'a, Joan and Santiago, Pascual and Pons, Jordi and Araz, Oguz R. and Scaini, David},
title={Universal Speech Enhancement with Score-based Diffusion},
howpublished={arXiv:2206.03065},
month=sep,
year=2022
}