SDAIASAug 20, 2025

Mamba2 Meets Silence: Robust Vocal Source Separation for Sparse Regions

arXiv:2508.14556v1h-index: 1ICTC
Originality Incremental advance
AI Analysis

This work addresses robust vocal isolation for audio processing applications, representing an incremental improvement with specific gains in performance metrics.

The paper tackles the problem of vocal source separation in music, particularly for intermittently occurring vocals, by introducing a model that combines Mamba2 with a band-splitting and dual-path architecture, achieving a cSDR of 11.03 dB, the best reported to date, and substantial gains in uSDR.

We introduce a new music source separation model tailored for accurate vocal isolation. Unlike Transformer-based approaches, which often fail to capture intermittently occurring vocals, our model leverages Mamba2, a recent state space model, to better capture long-range temporal dependencies. To handle long input sequences efficiently, we combine a band-splitting strategy with a dual-path architecture. Experiments show that our approach outperforms recent state-of-the-art models, achieving a cSDR of 11.03 dB-the best reported to date-and delivering substantial gains in uSDR. Moreover, the model exhibits stable and consistent performance across varying input lengths and vocal occurrence patterns. These results demonstrate the effectiveness of Mamba-based models for high-resolution audio processing and open up new directions for broader applications in audio research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes