SDASMLFeb 15, 2018

Blind Source Separation with Optimal Transport Non-negative Matrix Factorization

arXiv:1802.05429v116 citations
Originality Incremental advance
AI Analysis

This work addresses speech separation for audio processing applications, but it is incremental as it builds on existing NMF and optimal transport methods.

The paper tackled the problem of supervised speech blind source separation by developing an optimal transport non-negative matrix factorization algorithm, which led to perceptually better results than Euclidean NMF for tasks like isolated voice reconstruction and BSS.

Optimal transport as a loss for machine learning optimization problems has recently gained a lot of attention. Building upon recent advances in computational optimal transport, we develop an optimal transport non-negative matrix factorization (NMF) algorithm for supervised speech blind source separation (BSS). Optimal transport allows us to design and leverage a cost between short-time Fourier transform (STFT) spectrogram frequencies, which takes into account how humans perceive sound. We give empirical evidence that using our proposed optimal transport NMF leads to perceptually better results than Euclidean NMF, for both isolated voice reconstruction and BSS tasks. Finally, we demonstrate how to use optimal transport for cross domain sound processing tasks, where frequencies represented in the input spectrograms may be different from one spectrogram to another.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes