Automatic Music Sample Identification with Multi-Track Contrastive Learning
This addresses the challenge of sample identification for music producers and copyright enforcement, though it is incremental as it builds on existing contrastive learning methods.
The paper tackled the problem of automatically identifying sampled content in music by detecting and retrieving original audio material, achieving significant performance improvements over previous state-of-the-art baselines with robustness across genres and scalability in large reference databases.
Sampling, the technique of reusing pieces of existing audio tracks to create new music content, is a very common practice in modern music production. In this paper, we tackle the challenging task of automatic sample identification, that is, detecting such sampled content and retrieving the material from which it originates. To do so, we adopt a self-supervised learning approach that leverages a multi-track dataset to create positive pairs of artificial mixes, and design a novel contrastive learning objective. We show that such method significantly outperforms previous state-of-the-art baselines, that is robust to various genres, and that scales well when increasing the number of noise songs in the reference database. In addition, we extensively analyze the contribution of the different components of our training pipeline and highlight, in particular, the need for high-quality separated stems for this task.