SDASOct 17, 2019

End-to-end speech enhancement based on discrete cosine transform

arXiv:1910.07840v419 citations
Originality Synthesis-oriented
AI Analysis

This addresses speech enhancement for audio processing applications, but it appears incremental as it builds on existing DNN-based methods with a focus on computational efficiency.

The paper tackles the problem of speech enhancement by proposing a method using the Discrete Cosine Transform (DCT) to reconstruct a valid short-time spectrum, achieving perfect performance under a U-net structure.

Previous speech enhancement methods focus on estimating the short-time spectrum of speech signals due to its short-term stability. However, these methods often only estimate the clean magnitude spectrum and reuse the noisy phase when resynthesize speech signals, which is unlikely a valid short-time Fourier transform (STFT). Recently, DNN based speech enhancement methods mainly joint estimation of the magnitude and phase spectrum. These methods usually give better performance than magnitude spectrum estimation but need much larger computation and memory overhead. In this paper, we propose using the Discrete Cosine Transform (DCT) to reconstruct a valid short-time spectrum. Under the U-net structure, we enhance the real spectrogram and finally achieve perfect performance.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes