ASSDNov 25, 2019

Invertible DNN-based nonlinear time-frequency transform for speech enhancement

arXiv:1911.10764v210 citations
Originality Incremental advance
AI Analysis

This work addresses speech enhancement for audio processing applications, but it is incremental as it builds on existing end-to-end methods by adding invertibility.

The authors tackled speech enhancement by proposing an end-to-end method with a trainable, invertible nonlinear time-frequency transform using deep neural networks, achieving perfect reconstruction as a key property.

We propose an end-to-end speech enhancement method with trainable time-frequency~(T-F) transform based on invertible deep neural network~(DNN). The resent development of speech enhancement is brought by using DNN. The ordinary DNN-based speech enhancement employs T-F transform, typically the short-time Fourier transform~(STFT), and estimates a T-F mask using DNN. On the other hand, some methods have considered end-to-end networks which directly estimate the enhanced signals without T-F transform. While end-to-end methods have shown promising results, they are black boxes and hard to understand. Therefore, some end-to-end methods used a DNN to learn the linear T-F transform which is much easier to understand. However, the learned transform may not have a property important for ordinary signal processing. In this paper, as the important property of the T-F transform, perfect reconstruction is considered. An invertible nonlinear T-F transform is constructed by DNNs and learned from data so that the obtained transform is perfectly reconstructing filterbank.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes