ASSDMar 21, 2019

Data-driven design of perfect reconstruction filterbank for DNN-based sound source enhancement

arXiv:1903.08876v113 citations
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in audio processing for researchers and engineers, offering an incremental improvement by optimizing filterbank design to better align with DNN training assumptions.

The paper tackles the performance degradation in DNN-based sound source enhancement caused by mismatched statistical assumptions in simple cost functions like MSE, by proposing a data-driven method to design a perfect-reconstruction filterbank that satisfies these assumptions, resulting in a learned frequency scale between STFT and wavelet transform that outperforms standard STFT-based DNNs with mel-scale compression.

We propose a data-driven design method of perfect-reconstruction filterbank (PRFB) for sound-source enhancement (SSE) based on deep neural network (DNN). DNNs have been used to estimate a time-frequency (T-F) mask in the short-time Fourier transform (STFT) domain. Their training is more stable when a simple cost function as mean-squared error (MSE) is utilized comparing to some advanced cost such as objective sound quality assessments. However, such a simple cost function inherits strong assumptions on the statistics of the target and/or noise which is often not satisfied, and the mismatch of assumption results in degraded performance. In this paper, we propose to design the frequency scale of PRFB from training data so that the assumption on MSE is satisfied. For designing the frequency scale, the warped filterbank frame (WFBF) is considered as PRFB. The frequency characteristic of learned WFBF was in between STFT and the wavelet transform, and its effectiveness was confirmed by comparison with a standard STFT-based DNN whose input feature is compressed into the mel scale.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes