SD AI ASMay 10, 2024

An Investigation of Incorporating Mamba for Speech Enhancement

Rong Chao, Wen-Huang Cheng, Moreno La Quatra, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Szu-Wei Fu, Yu Tsao

Georgia Tech

arXiv:2405.06573v228.7123 citationsh-index: 33Has CodeSLT

Originality Incremental advance

AI Analysis

This work addresses speech enhancement for applications like automatic speech recognition, but it is incremental as it applies an existing model (Mamba) to a new task.

This paper tackled speech enhancement by investigating the use of the Mamba state-space model, achieving a competitive PESQ of 3.55 and a new state-of-the-art PESQ of 3.69 when combined with Perceptual Contrast Stretching, along with a FLOPs reduction of up to ~12% compared to Transformer-based methods.

This work aims to investigate the use of a recently proposed, attention-free, scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. In particular, we employ Mamba to deploy different regression-based SE models (SEMamba) with different configurations, namely basic, advanced, causal, and non-causal. Furthermore, loss functions either based on signal-level distances or metric-oriented are considered. Experimental evidence shows that SEMamba attains a competitive PESQ of 3.55 on the VoiceBank-DEMAND dataset with the advanced, non-causal configuration. A new state-of-the-art PESQ of 3.69 is also reported when SEMamba is combined with Perceptual Contrast Stretching (PCS). Compared against Transformed-based equivalent SE solutions, a noticeable FLOPs reduction up to ~12% is observed with the advanced non-causal configurations. Finally, SEMamba can be used as a pre-processing step before automatic speech recognition (ASR), showing competitive performance against recent SE solutions.

View on arXiv PDF Code

Similar