SD LG ASDec 24, 2024

U-Mamba-Net: A highly efficient Mamba-based U-net style network for noisy and reverberant speech separation

Shaoxiang Dang, Tetsuya Matsumoto, Yoshinori Takeuchi, Hiroaki Kudo

arXiv:2412.18217v16 citationsh-index: 10APSIPA

Originality Incremental advance

AI Analysis

This work addresses the computational burden for researchers in speech separation, offering a more efficient model for complex environments.

The paper tackles the problem of high computational cost in speech separation models by proposing U-Mamba-Net, a lightweight Mamba-based U-Net style network for noisy and reverberant speech separation, achieving improved performance with low computational cost on the Libri2mix dataset.

The topic of speech separation involves separating mixed speech with multiple overlapping speakers into several streams, with each stream containing speech from only one speaker. Many highly effective models have emerged and proliferated rapidly over time. However, the size and computational load of these models have also increased accordingly. This is a disaster for the community, as researchers need more time and computational resources to reproduce and compare existing models. In this paper, we propose U-mamba-net: a lightweight Mamba-based U-style model for speech separation in complex environments. Mamba is a state space sequence model that incorporates feature selection capabilities. U-style network is a fully convolutional neural network whose symmetric contracting and expansive paths are able to learn multi-resolution features. In our work, Mamba serves as a feature filter, alternating with U-Net. We test the proposed model on Libri2mix. The results show that U-Mamba-Net achieves improved performance with quite low computational cost.

View on arXiv PDF

Similar