GAN-Based Speech Enhancement for Low SNR Using Latent Feature Conditioning

Shrishti Saha Shetu, Emanuël A. P. Habets, Andreas Brendel

FhG_IIS
Fraunhofer IIS, Am Wolfsmantel 33, 91058 Erlangen, Germany

{shrishti.saha.shetu, emanuel.habets, andreas.brendel}@iis.fraunhofer.de

Abstract

Enhancing speech quality under adverse SNR conditions remains a significant challenge for discriminative deep neural network (DNN)-based approaches. In this work, we propose DisCoGAN, which is a time-frequency-domain generative adversarial network (GAN) conditioned by the latent features of a discriminative model pre-trained for speech enhancement in low SNR scenarios. Our proposed method achieves superior performance compared to state-of-the-arts discriminative methods and also surpasses end-to-end (E2E) trained GAN models. We also investigate the impact of various configurations for conditioning the proposed GAN model with the discriminative model and assess their influence on enhancing speech quality.

Evaluation Scenarios

In our work, we evaluate our proposed method with different SOTA generative and discriminative deep learning-based noise reduction methods in various SNR scenraios.

Following you can find some processed samples with different Methods:

1. Low SNR Dataset --> Go to the samples

2. VCTK Dataset --> Go to the samples


1. Low SNR Dataset

Item 1 (SNR: -4dB, Speaker: Female)

>

Item 2 (SNR: -9dB, Speaker: Male)

>

Item 3 (SNR: -15dB, Speaker: Female)

>

Item 4 (SNR: -2dB, Speaker: Male)

>

Item 5 (SNR: -2dB, Speaker: Male)

>

Item 6 (SNR: -19dB, Speaker: Male)

>

Item 7 (SNR: -5dB, Speaker: Female)

>

Item 8 (SNR: -9dB, Speaker: Female)

>

Item 9 (SNR: -4dB, Speaker: Female)

>

Item 10 (SNR: -8dB, Speaker: Male)

>

2. VCTK Dataset

Item 1

>

Item 2

>

Item 3

>

Item 4

>

Item 5

>

Item 6

>

Item 7

>

Item 8

>

Item 9

>

Item 10

>



Conditions of Use

1.Fraunhofer IIS generated this sound material based on material that is publicly available on VCTK dataset, DNS Challenge and ESC-50 .

2.The content has been processed using generally accepted rules of technology as well as scientific care, but not actual attainment of any expected feature.

3. With the exception of willful intent or gross negligence, Fraunhofer IIS shall not be liable that Open Source software or other third-party software is free from any error or claim or its fitness for a particular purpose, even if included within the Sound Material.

4.The Sound Material shall only be used for testing and appreciating noise reduction techniques and shall not be copied, publicly transmitted, distributed, lent or modified for any other reason.

5.No representation or warranties are made or implied regarding the accuracy, non-infringement, or fitness for a particular purpose of Sound Material.

6.Copyright and Permission notice shall be duplicated whenever Sound Material is copied, distributed, or publicly transmitted.

6.The Sound material cannot be distributed with charge. --> Go to Top