SDAIOct 12, 2020

The Cone of Silence: Speech Separation by Localization

arXiv:2010.06007v169 citations
Originality Highly original
AI Analysis

This addresses the challenge of speech separation and localization for applications like hearing aids or voice assistants, with incremental improvements in handling moving speakers and unseen numbers of sources.

The paper tackles the problem of separating and localizing multiple concurrent speakers from multi-microphone recordings, achieving state-of-the-art performance in both tasks, especially in high noise conditions.

Given a multi-microphone recording of an unknown number of speakers talking concurrently, we simultaneously localize the sources and separate the individual speakers. At the core of our method is a deep network, in the waveform domain, which isolates sources within an angular region $θ\pm w/2$, given an angle of interest $θ$ and angular window size $w$. By exponentially decreasing $w$, we can perform a binary search to localize and separate all sources in logarithmic time. Our algorithm allows for an arbitrary number of potentially moving speakers at test time, including more speakers than seen during training. Experiments demonstrate state-of-the-art performance for both source separation and source localization, particularly in high levels of background noise.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes