SDAIASJan 10, 2025

xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement

arXiv:2501.06146v26 citationsh-index: 5INTERSPEECH
Originality Incremental advance
AI Analysis

This work addresses speech enhancement for audio processing applications, presenting an incremental improvement by applying a novel architecture to an existing task.

The paper tackles the problem of single-channel speech enhancement by introducing xLSTM-SENet, the first system based on the Extended Long Short-Term Memory (xLSTM) architecture, which outperforms state-of-the-art Mamba- and Conformer-based systems on the VoiceBank+Demand dataset.

While attention-based architectures, such as Conformers, excel in speech enhancement, they face challenges such as scalability with respect to input sequence length. In contrast, the recently proposed Extended Long Short-Term Memory (xLSTM) architecture offers linear scalability. However, xLSTM-based models remain unexplored for speech enhancement. This paper introduces xLSTM-SENet, the first xLSTM-based single-channel speech enhancement system. A comparative analysis reveals that xLSTM-and notably, even LSTM-can match or outperform state-of-the-art Mamba- and Conformer-based systems across various model sizes in speech enhancement on the VoiceBank+Demand dataset. Through ablation studies, we identify key architectural design choices such as exponential gating and bidirectionality contributing to its effectiveness. Our best xLSTM-based model, xLSTM-SENet2, outperforms state-of-the-art Mamba- and Conformer-based systems of similar complexity on the Voicebank+DEMAND dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes