SDAICVASOct 14, 2024

CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning

arXiv:2410.11062v23 citationsh-index: 6Has CodeISCAS
Originality Incremental advance
AI Analysis

This addresses efficient speech denoising for real-time applications, offering an incremental improvement in model efficiency.

The paper tackled real-time audio denoising by proposing CleanUMamba, a compact neural network that uses Mamba and channel pruning, achieving a PESQ score of 2.42 and STOI of 95.1% with only 442K parameters.

This paper presents CleanUMamba, a time-domain neural network architecture designed for real-time causal audio denoising directly applied to raw waveforms. CleanUMamba leverages a U-Net encoder-decoder structure, incorporating the Mamba state-space model in the bottleneck layer. By replacing conventional self-attention and LSTM mechanisms with Mamba, our architecture offers superior denoising performance while maintaining a constant memory footprint, enabling streaming operation. To enhance efficiency, we applied structured channel pruning, achieving an 8X reduction in model size without compromising audio quality. Our model demonstrates strong results in the Interspeech 2020 Deep Noise Suppression challenge. Specifically, CleanUMamba achieves a PESQ score of 2.42 and STOI of 95.1% with only 442K parameters and 468M MACs, matching or outperforming larger models in real-time performance. Code will be available at: https://github.com/lab-emi/CleanUMamba

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes