SD ASJul 30, 2018

Harmonic-Percussive Source Separation with Deep Neural Networks and Phase Recovery

Konstantinos Drossos, Paul Magron, Stylianos Ioannis Mimilakis, Tuomas Virtanen

arXiv:1807.11298v15.210 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of separating pitched and percussive instruments in music mixtures for audio processing applications, representing an incremental improvement over existing methods.

The paper tackled harmonic-percussive source separation in music by applying the MaD TwinNet deep learning architecture to estimate percussive source spectrograms and using a phase recovery algorithm to reconstruct sources, achieving results that outperform the previous state-of-the-art kernel additive model approach.

Harmonic/percussive source separation (HPSS) consists in separating the pitched instruments from the percussive parts in a music mixture. In this paper, we propose to apply the recently introduced Masker-Denoiser with twin networks (MaD TwinNet) system to this task. MaD TwinNet is a deep learning architecture that has reached state-of-the-art results in monaural singing voice separation. Herein, we propose to apply it to HPSS by using it to estimate the magnitude spectrogram of the percussive source. Then, we retrieve the complex-valued short-time Fourier transform of the sources by means of a phase recovery algorithm, which minimizes the reconstruction error and enforces the phase of the harmonic part to follow a sinusoidal phase model. Experiments conducted on realistic music mixtures show that this novel separation system outperforms the previous state-of-the art kernel additive model approach.

View on arXiv PDF

Similar