Differentiable Time-Varying IIR Filtering for Real-Time Speech Denoising
This work addresses the problem of speech denoising for applications requiring real-time and interpretable processing, such as hearing aids or voice assistants.
The authors tackled the problem of real-time speech denoising and achieved effective adaptation to changing noise conditions with their TVF model, which has 1 million parameters. The model's performance is demonstrated on the Valentini-Botinhao dataset.
We present TVF (Time-Varying Filtering), a low-latency speech enhancement model with 1 million parameters. Combining the interpretability of Digital Signal Processing (DSP) with the adaptability of deep learning, TVF bridges the gap between traditional filtering and modern neural speech modeling. The model utilizes a lightweight neural network backbone to predict the coefficients of a differentiable 35-band IIR filter cascade in real time, allowing it to adapt dynamically to non-stationary noise. Unlike ``black-box'' deep learning approaches, TVF offers a completely interpretable processing chain, where spectral modifications are explicit and adjustable. We demonstrate the efficacy of this approach on a speech denoising task using the Valentini-Botinhao dataset and compare the results to a static DDSP approach and a fully deep-learning-based solution, showing that TVF achieves effective adaptation to changing noise conditions.