SDLGNEApr 12, 2015

Deep Transform: Cocktail Party Source Separation via Complex Convolution in a Deep Neural Network

arXiv:1504.02945v19 citations
Originality Incremental advance
AI Analysis

This work addresses source separation for audio processing applications, but it is incremental as it matches rather than surpasses prior non-complex approaches.

The paper tackled the problem of cocktail party source separation by introducing a complex convolutional deep neural network that estimates both magnitude and phase of source spectrograms from monaural mixtures, achieving results comparable to existing binary-mask methods.

Convolutional deep neural networks (DNN) are state of the art in many engineering problems but have not yet addressed the issue of how to deal with complex spectrograms. Here, we use circular statistics to provide a convenient probabilistic estimate of spectrogram phase in a complex convolutional DNN. In a typical cocktail party source separation scenario, we trained a convolutional DNN to re-synthesize the complex spectrograms of two source speech signals given a complex spectrogram of the monaural mixture - a discriminative deep transform (DT). We then used this complex convolutional DT to obtain probabilistic estimates of the magnitude and phase components of the source spectrograms. Our separation results are on a par with equivalent binary-mask based non-complex separation approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes