Towards Cover Song Detection with Siamese Convolutional Neural Networks
This work addresses the challenge of identifying cover songs for music information retrieval applications, representing an incremental improvement in the field.
The paper tackled the problem of cover song detection by proposing a novel approach to learn audio representations using Siamese convolutional neural networks, achieving a mean precision@1 of 65%, which is ten times better than random guessing.
A cover song, by definition, is a new performance or recording of a previously recorded, commercially released song. It may be by the original artist themselves or a different artist altogether and can vary from the original in unpredictable ways including key, arrangement, instrumentation, timbre and more. In this work we propose a novel approach to learning audio representations for the task of cover song detection. We train a neural architecture on tens of thousands of cover-song audio clips and test it on a held out set. We obtain a mean precision@1 of 65% over mini-batches, ten times better than random guessing. Our results indicate that Siamese network configurations show promise for approaching the cover song identification problem.