AS LG SDJul 27, 2020

On the Use of Audio Fingerprinting Features for Speech Enhancement with Generative Adversarial Network

Farnood Faraji, Yazid Attabi, Benoit Champagne, Wei-Ping Zhu

arXiv:2007.13258v11.2

Originality Incremental advance

AI Analysis

This work addresses speech enhancement for audio processing applications, but it is incremental as it builds on existing GAN methods with a new feature combination.

The paper tackled speech enhancement by proposing a combination of Audio Fingerprinting features (MFCC and NSSC) with a Generative Adversarial Network, achieving the best objective performance in experiments with diverse speakers and noise types while reducing memory and training time.

The advent of learning-based methods in speech enhancement has revived the need for robust and reliable training features that can compactly represent speech signals while preserving their vital information. Time-frequency domain features, such as the Short-Term Fourier Transform (STFT) and Mel-Frequency Cepstral Coefficients (MFCC), are preferred in many approaches. While the MFCC provide for a compact representation, they ignore the dynamics and distribution of energy in each mel-scale subband. In this work, a speech enhancement system based on Generative Adversarial Network (GAN) is implemented and tested with a combination of Audio FingerPrinting (AFP) features obtained from the MFCC and the Normalized Spectral Subband Centroids (NSSC). The NSSC capture the locations of speech formants and complement the MFCC in a crucial way. In experiments with diverse speakers and noise types, GAN-based speech enhancement with the proposed AFP feature combination achieves the best objective performance while reducing memory requirements and training time.

View on arXiv PDF

Similar