File size: 2,264 Bytes
e95d7fa
 
2b41b0d
50cc538
 
 
 
 
 
e95d7fa
1d794dd
974cbd5
5336ba0
cbbcba4
5336ba0
 
 
cbbcba4
5336ba0
 
 
d346fa8
50cc538
cbbcba4
5336ba0
e7a917b
5336ba0
cbbcba4
 
 
5336ba0
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
license: mit
pipeline_tag: audio-to-audio
tags:
- denoising
- speech enhancement
- speech separation
- noise suppression
- realtime
---

This is a pre-trained version of Fast FullSubNet, a real-time denoising model trained on the Deep Noise Suppression Challenge dataset of 2020 ([DNS-INTERSPEECH-2020](https://github.com/microsoft/DNS-Challenge/tree/interspeech2020/master)).

## How to run

https://fullsubnet.readthedocs.io/en/latest/usage/getting_started.html

## Code

https://github.com/Audio-WestlakeU/FullSubNet

Note: The code doesn't support real-time streaming out of the box. See [issue-67](https://github.com/Audio-WestlakeU/FullSubNet/issues/67) for details.

## Paper

[Fast FullSubNet: Accelerate Full-band and Sub-band Fusion Model for Single-channel Speech Enhancement](https://arxiv.org/abs/2212.09019), Xiang Hao, Xiaofei Li

> For many speech enhancement applications, a key feature is that system runs on a real-time, latency-sensitive, battery-powered platform, which strictly limits the algorithm latency and computational complexity. In this work, we propose a new architecture named Fast FullSubNet dedicated to accelerating the computation of FullSubNet. Specifically, Fast FullSubNet processes sub-band speech spectra in the mel-frequency domain by using cascaded linear-to-mel full-band, sub-band, and mel-to-linear full-band models such that frequencies involved in the sub-band computation are vastly reduced. After that, a down-sampling operation is proposed for the sub-band input sequence to further reduce the computational complexity along the time axis. Experimental results show that, compared to FullSubNet, Fast FullSubNet has only 13\% computational complexity and 16\% processing time, and achieves comparable or even better performance.

## Performance

|   | With Reverb |   |   |   | No Reverb |   |   |  
-- | -- | -- | -- | -- | -- | -- | --
Method                 | WB-PESQ | NB-PESQ | SI-SDR | STOI | WB-PESQ | NB-PESQ | SI-SDR | STOI
Fast FullSubNet (118 Epochs) | 2.882 | 3.42 | 15.33 | 0.9233 | 2.694 | 3.222 | 16.34 | 0.9571
[FullSubNet (58 Epochs)](https://github.com/Audio-WestlakeU/FullSubNet/releases/tag/v0.2) (just for comparison) | 2.987 | 3.496 | 15.756 | 0.926 | 2.889 | 3.385 | 17.635 | 0.964