ASLGSDSPOct 20, 2021

REAL-M: Towards Speech Separation on Real Mixtures

arXiv:2110.10812v127 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the gap in evaluating speech separation for real-world applications, though it is incremental as it focuses on evaluation rather than a new separation method.

The paper tackles the problem of evaluating speech separation models on real-world mixtures by introducing the REAL-M dataset and a blind SI-SNR neural estimator, showing that the estimator correlates well with human opinions and aligns with performance trends on synthetic benchmarks.

In recent years, deep learning based source separation has achieved impressive results. Most studies, however, still evaluate separation models on synthetic datasets, while the performance of state-of-the-art techniques on in-the-wild speech data remains an open question. This paper contributes to fill this gap in two ways. First, we release the REAL-M dataset, a crowd-sourced corpus of real-life mixtures. Secondly, we address the problem of performance evaluation of real-life mixtures, where the ground truth is not available. We bypass this issue by carefully designing a blind Scale-Invariant Signal-to-Noise Ratio (SI-SNR) neural estimator. Through a user study, we show that our estimator reliably evaluates the separation performance on real mixtures. The performance predictions of the SI-SNR estimator indeed correlate well with human opinions. Moreover, we observe that the performance trends predicted by our estimator on the REAL-M dataset closely follow those achieved on synthetic benchmarks when evaluating popular speech separation models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes