SDLGASMay 26, 2019

Auditory Separation of a Conversation from Background via Attentional Gating

arXiv:1905.10751v16 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of auditory scene analysis for applications like hearing aids or audio processing, but it is incremental as it builds on existing separation methods with a novel attentional mechanism.

The authors tackled the problem of separating a conversation from background chatter in a sound mixture with an unknown number of sources, achieving a 9% improvement in single-speaker separation and performing only 7% worse on a new multi-speaker subset separation task compared to single-speaker performance.

We present a model for separating a set of voices out of a sound mixture containing an unknown number of sources. Our Attentional Gating Network (AGN) uses a variable attentional context to specify which speakers in the mixture are of interest. The attentional context is specified by an embedding vector which modifies the processing of a neural network through an additive bias. Individual speaker embeddings are learned to separate a single speaker while superpositions of the individual speaker embeddings are used to separate sets of speakers. We first evaluate AGN on a traditional single speaker separation task and show an improvement of 9% with respect to comparable models. Then, we introduce a new task to separate an arbitrary subset of voices from a mixture of an unknown-sized set of voices, inspired by the human ability to separate a conversation of interest from background chatter at a cafeteria. We show that AGN is the only model capable of solving this task, performing only 7% worse than on the single speaker separation task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes