SDAIASJun 1, 2025

A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement

arXiv:2506.01023v11 citationsh-index: 15INTERSPEECH
Originality Incremental advance
AI Analysis

This work addresses speech enhancement for real-time applications, offering an incremental improvement in efficiency and performance over existing methods.

The paper tackles real-time speech enhancement by proposing a two-stage hierarchical deep filtering framework that integrates sub-band processing and decoupled filtering to exploit surrounding time-frequency bin information, achieving superior performance with fewer resources compared to advanced systems.

This paper proposes a model that integrates sub-band processing and deep filtering to fully exploit information from the target time-frequency (TF) bin and its surrounding TF bins for single-channel speech enhancement. The sub-band module captures surrounding frequency bin information at the input, while the deep filtering module applies filtering at the output to both the target TF bin and its surrounding TF bins. To further improve the model performance, we decouple deep filtering into temporal and frequency components and introduce a two-stage framework, reducing the complexity of filter coefficient prediction at each stage. Additionally, we propose the TAConv module to strengthen convolutional feature extraction. Experimental results demonstrate that the proposed hierarchical deep filtering network (HDF-Net) effectively utilizes surrounding TF bin information and outperforms other advanced systems while using fewer resources.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes