ASSDNov 15, 2021

Monaural source separation: From anechoic to reverberant environments

arXiv:2111.07578v233 citations
AI Analysis

This work challenges the practical usefulness of recent improvements in monaural source separation by showing they may not translate well to real-world reverberant conditions, which is a problem for applications in audio processing and speech recognition.

The paper tackled the problem of adapting neural network-based monaural speech source separation from anechoic to reverberant environments, resulting in only marginal performance gains over simpler methods despite a 7 percentage point improvement in word error rate compared to the baseline.

Impressive progress in neural network-based single-channel speech source separation has been made in recent years. But those improvements have been mostly reported on anechoic data, a situation that is hardly met in practice. Taking the SepFormer as a starting point, which achieves state-of-the-art performance on anechoic mixtures, we gradually modify it to optimize its performance on reverberant mixtures. Although this leads to a word error rate improvement by 7 percentage points compared to the standard SepFormer implementation, the system ends up with only marginally better performance than a PIT-BLSTM separation system, that is optimized with rather straightforward means. This is surprising and at the same time sobering, challenging the practical usefulness of many improvements reported in recent years for monaural source separation on nonreverberant data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes