ASLGNESDMLFeb 20, 2020

Efficient Trainable Front-Ends for Neural Speech Enhancement

arXiv:2002.09286v13 citations
AI Analysis

This work addresses computational bottlenecks in speech enhancement for resource-constrained applications, representing an incremental improvement.

The paper tackled the inefficiency of trainable STFT front-ends in neural speech enhancement by proposing a butterfly-based Fast Fourier Transform method, achieving improved accuracy and efficiency for low-compute systems.

Many neural speech enhancement and source separation systems operate in the time-frequency domain. Such models often benefit from making their Short-Time Fourier Transform (STFT) front-ends trainable. In current literature, these are implemented as large Discrete Fourier Transform matrices; which are prohibitively inefficient for low-compute systems. We present an efficient, trainable front-end based on the butterfly mechanism to compute the Fast Fourier Transform, and show its accuracy and efficiency benefits for low-compute neural speech enhancement models. We also explore the effects of making the STFT window trainable.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes