SDASMay 22, 2018

Music Source Separation Using Stacked Hourglass Networks

arXiv:1805.08559v244 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of separating multiple music sources for audio processing applications, but it is incremental as it applies an existing method to a new task.

The paper tackled music source separation by adapting stacked hourglass networks from human pose estimation to generate masks from spectrograms, achieving competitive results on MIR-1K and DSD100 datasets.

In this paper, we propose a simple yet effective method for multiple music source separation using convolutional neural networks. Stacked hourglass network, which was originally designed for human pose estimation in natural images, is applied to a music source separation task. The network learns features from a spectrogram image across multiple scales and generates masks for each music source. The estimated mask is refined as it passes over stacked hourglass modules. The proposed framework is able to separate multiple music sources using a single network. Experimental results on MIR-1K and DSD100 datasets validate that the proposed method achieves competitive results comparable to the state-of-the-art methods in multiple music source separation and singing voice separation tasks.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes