SD ASOct 20, 2020

Phase recovery with Bregman divergences for audio source separation

Paul Magron, Pierre-Hugo Vial, Thomas Oberlin, Cédric Févotte

arXiv:2010.10255v21.9

Originality Incremental advance

AI Analysis

This work addresses phase recovery for audio source separation, which is an incremental improvement in a domain-specific task.

The paper tackled the problem of phase recovery in audio source separation by reformulating it as a minimization problem using Bregman divergences, and experiments on speech enhancement showed that this approach outperforms the existing MISI algorithm for several alternative losses.

Time-frequency audio source separation is usually achieved by estimating the short-time Fourier transform (STFT) magnitude of each source, and then applying a phase recovery algorithm to retrieve time-domain signals. In particular, the multiple input spectrogram inversion (MISI) algorithm has shown good performance in several recent works. This algorithm minimizes a quadratic reconstruction error between magnitude spectrograms. However, this loss does not properly account for some perceptual properties of audio, and alternative discrepancy measures such as beta-divergences have been preferred in many settings. In this paper, we propose to reformulate phase recovery in audio source separation as a minimization problem involving Bregman divergences. To optimize the resulting objective, we derive a projected gradient descent algorithm. Experiments conducted on a speech enhancement task show that this approach outperforms MISI for several alternative losses, which highlights their relevance for audio source separation applications.

View on arXiv PDF

Similar