SDLGNEMar 20, 2015

Deep Transform: Cocktail Party Source Separation via Probabilistic Re-Synthesis

arXiv:1503.06046v14 citations
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of speech separation in noisy environments, which is incremental as it applies existing neural network methods to a known problem.

The paper tackled the problem of separating concurrent speech streams in cocktail party scenarios by training convolutive autoencoder deep neural networks to perform probabilistic re-synthesis directly in the time-domain, demonstrating that simple neural networks can exploit monaural and binaural information.

In cocktail party listening scenarios, the human brain is able to separate competing speech signals. However, the signal processing implemented by the brain to perform cocktail party listening is not well understood. Here, we trained two separate convolutive autoencoder deep neural networks (DNN) to separate monaural and binaural mixtures of two concurrent speech streams. We then used these DNNs as convolutive deep transform (CDT) devices to perform probabilistic re-synthesis. The CDTs operated directly in the time-domain. Our simulations demonstrate that very simple neural networks are capable of exploiting monaural and binaural information available in a cocktail party listening scenario.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes