|
--- |
|
language: |
|
- "en" |
|
thumbnail: |
|
tags: |
|
- audio-to-audio |
|
- Speech Enhancement |
|
- Voicebank-DEMAND |
|
- UNIVERSE |
|
- UNIVERSE++ |
|
- Diffusion |
|
- pytorch |
|
- open-universe |
|
license: "apache-2.0" |
|
datasets: |
|
- Voicebank-DEMAND |
|
metrics: |
|
- SI-SNR |
|
- PESQ |
|
- SIG |
|
- BAK |
|
- OVRL |
|
model-index: |
|
- name: universe++ |
|
results: |
|
- task: |
|
name: Speech Enhancement |
|
type: speech-enhancement |
|
dataset: |
|
name: DEMAND |
|
type: demand |
|
split: test-set |
|
args: |
|
language: en |
|
metrics: |
|
- name: DNSMOS SIG |
|
type: sig |
|
value: '3.493' |
|
- name: DNSMOS BAK |
|
type: bak |
|
value: '4.042' |
|
- name: DNSMOS OVRL |
|
type: ovrl |
|
value: '3.205' |
|
- name: PESQ |
|
type: pesq |
|
value: 3.017 |
|
- name: SI-SDR |
|
type: si-sdr |
|
value: 18.629 |
|
--- |
|
# open-universe: Generative Speech Enhancement with Score-based Diffusion and Adversarial Training |
|
|
|
This repository contains the configurations and weights for the [UNIVERSE++](https://arxiv.org/abs/2406.12194) and |
|
[UNIVERSE](https://arxiv.org/abs/2206.03065) models implemented in [open-universe](https://github.com/line/open-universe). |
|
|
|
The models were trained on the [Voicebank-DEMAND](https://datashare.ed.ac.uk/handle/10283/2791) dataset at 16 kHz. |
|
|
|
The performance on the test split of Voicebank-DEMAND is given in the following table. |
|
|
|
| model | si-sdr | pesq-wb | stoi-ext | lsd | lps | OVRL | SIG | BAK | |
|
|------------|----------|-----------|------------|-------|-------|--------|-------|-------| |
|
| UNIVERSE++ | 18.624 | 3.017 | 0.864 | 4.867 | 0.937 | 3.200 | 3.489 | 4.040 | |
|
| UNIVERSE | 17.600 | 2.830 | 0.844 | 6.318 | 0.920 | 3.157 | 3.457 | 4.013 | |
|
|
|
## Usage |
|
|
|
Start by installing `open-universe`. |
|
We use conda to simplify the installation. |
|
```sh |
|
git clone https://github.com/line/open-universe.git |
|
cd open-universe |
|
conda env create -f environment.yaml |
|
conda activate open-universe |
|
python -m pip install . |
|
``` |
|
|
|
Then the models can be used as follows. |
|
```sh |
|
# UNIVERSE++ (default model) |
|
python -m open_universe.bin.enhance <input/folder> <output/folder> \ |
|
--model line-corporation/open-universe:plusplus |
|
|
|
# UNIVERSE |
|
python -m open_universe.bin.enhance <input/folder> <output/folder> \ |
|
--model line-corporation/open-universe:original |
|
``` |
|
|
|
## Referencing open-universe and UNIVERSE++ |
|
|
|
If you use these models in your work, please consider citing the following paper. |
|
|
|
```latex |
|
@inproceedings{universepp, |
|
authors={Scheibler, Robin and Fujita, Yusuke and Shirahata, Yuma and Komatsu, Tatsuya}, |
|
title={Universal Score-based Speech Enhancement with High Content Preservation}, |
|
booktitle={Proc. Interspeech 2024}, |
|
month=sep, |
|
year=2024 |
|
} |
|
``` |
|
|
|
## Referencing UNIVERSE |
|
|
|
```latex |
|
@misc{universe, |
|
authors={Serr\'a, Joan and Santiago, Pascual and Pons, Jordi and Araz, Oguz R. and Scaini, David}, |
|
title={Universal Speech Enhancement with Score-based Diffusion}, |
|
howpublished={arXiv:2206.03065}, |
|
month=sep, |
|
year=2022 |
|
} |
|
``` |
|
|