SDAIASMay 10, 2024

An Investigation of Incorporating Mamba for Speech Enhancement

Georgia Tech
arXiv:2405.06573v2116 citationsh-index: 33SLT
Originality Incremental advance
AI Analysis

This work addresses speech enhancement for applications like automatic speech recognition, but it is incremental as it applies an existing model (Mamba) to a new task.

This paper tackled speech enhancement by investigating the use of the Mamba state-space model, achieving a competitive PESQ of 3.55 and a new state-of-the-art PESQ of 3.69 when combined with Perceptual Contrast Stretching, along with a FLOPs reduction of up to ~12% compared to Transformer-based methods.

This work aims to investigate the use of a recently proposed, attention-free, scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. In particular, we employ Mamba to deploy different regression-based SE models (SEMamba) with different configurations, namely basic, advanced, causal, and non-causal. Furthermore, loss functions either based on signal-level distances or metric-oriented are considered. Experimental evidence shows that SEMamba attains a competitive PESQ of 3.55 on the VoiceBank-DEMAND dataset with the advanced, non-causal configuration. A new state-of-the-art PESQ of 3.69 is also reported when SEMamba is combined with Perceptual Contrast Stretching (PCS). Compared against Transformed-based equivalent SE solutions, a noticeable FLOPs reduction up to ~12% is observed with the advanced non-causal configurations. Finally, SEMamba can be used as a pre-processing step before automatic speech recognition (ASR), showing competitive performance against recent SE solutions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes