SDLGASNov 4, 2020

Single channel voice separation for unknown number of speakers under reverberant and noisy settings

arXiv:2011.02329v132 citations
AI Analysis

This addresses the challenge of speaker separation in real-world acoustic environments for applications like speech recognition or audio processing, representing an incremental improvement over existing methods.

The paper tackles the problem of separating an unknown number of speakers from a single-channel audio signal under noisy and reverberant conditions, achieving superior performance compared to a baseline model. It also introduces a new dataset with up to five simultaneous speakers in such settings.

We present a unified network for voice separation of an unknown number of speakers. The proposed approach is composed of several separation heads optimized together with a speaker classification branch. The separation is carried out in the time domain, together with parameter sharing between all separation heads. The classification branch estimates the number of speakers while each head is specialized in separating a different number of speakers. We evaluate the proposed model under both clean and noisy reverberant set-tings. Results suggest that the proposed approach is superior to the baseline model by a significant margin. Additionally, we present a new noisy and reverberant dataset of up to five different speakers speaking simultaneously.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes