SDASAug 19, 2019

Audio query-based music source separation

arXiv:1908.06593v154 citations
AI Analysis

This addresses the limitation of previous methods that could only separate a few predefined sources, offering a more flexible approach for music information retrieval applications.

The authors tackled the problem of separating arbitrary sources in music mixtures by proposing an audio query-based network that encodes source information from a query signal, enabling separation of multiple sources with a single network and achieving results on the MUSDB18 dataset.

In recent years, music source separation has been one of the most intensively studied research areas in music information retrieval. Improvements in deep learning lead to a big progress in music source separation performance. However, most of the previous studies are restricted to separating a few limited number of sources, such as vocals, drums, bass, and other. In this study, we propose a network for audio query-based music source separation that can explicitly encode the source information from a query signal regardless of the number and/or kind of target signals. The proposed method consists of a Query-net and a Separator: given a query and a mixture, the Query-net encodes the query into the latent space, and the Separator estimates masks conditioned by the latent vector, which is then applied to the mixture for separation. The Separator can also generate masks using the latent vector from the training samples, allowing separation in the absence of a query. We evaluate our method on the MUSDB18 dataset, and experimental results show that the proposed method can separate multiple sources with a single network. In addition, through further investigation of the latent space we demonstrate that our method can generate continuous outputs via latent vector interpolation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes