SDLGASMLMay 3, 2019

Deep Tensor Factorization for Spatially-Aware Scene Decomposition

arXiv:1905.01391v29 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of analyzing complex audio scenes for applications like audio processing and surveillance, though it appears incremental as it builds on existing tensor factorization methods by integrating deep learning techniques.

The authors tackled the problem of unsupervised audio scene decomposition with random microphone arrangements by proposing a neural network architecture interpreted as nonnegative tensor factorization, enabling the separation of constituent sources and their spatial presence without requiring labeled data.

We propose a completely unsupervised method to understand audio scenes observed with random microphone arrangements by decomposing the scene into its constituent sources and their relative presence in each microphone. To this end, we formulate a neural network architecture that can be interpreted as a nonnegative tensor factorization of a multi-channel audio recording. By clustering on the learned network parameters corresponding to channel content, we can learn sources' individual spectral dictionaries and their activation patterns over time. Our method allows us to leverage deep learning advances like end-to-end training, while also allowing stochastic minibatch training so that we can feasibly decompose realistic audio scenes that are intractable to decompose using standard methods. This neural network architecture is easily extensible to other kinds of tensor factorizations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes