SD ASMay 22, 2018

Music Source Separation Using Stacked Hourglass Networks

Sungheon Park, Taehoon Kim, Kyogu Lee, Nojun Kwak

arXiv:1805.08559v211.444 citations

Originality Synthesis-oriented

AI Analysis

This addresses the problem of separating multiple music sources for audio processing applications, but it is incremental as it applies an existing method to a new task.

The paper tackled music source separation by adapting stacked hourglass networks from human pose estimation to generate masks from spectrograms, achieving competitive results on MIR-1K and DSD100 datasets.

In this paper, we propose a simple yet effective method for multiple music source separation using convolutional neural networks. Stacked hourglass network, which was originally designed for human pose estimation in natural images, is applied to a music source separation task. The network learns features from a spectrogram image across multiple scales and generates masks for each music source. The estimated mask is refined as it passes over stacked hourglass modules. The proposed framework is able to separate multiple music sources using a single network. Experimental results on MIR-1K and DSD100 datasets validate that the proposed method achieves competitive results comparable to the state-of-the-art methods in multiple music source separation and singing voice separation tasks.

View on arXiv PDF

Similar