Musical Audio Similarity with Self-supervised Convolutional Neural Networks
This work addresses the need for more natural music search tools for video producers, though it is incremental as it builds on existing self-supervised and triplet loss methods.
The paper tackled the problem of music similarity search for video producers by developing a system that suggests similar-sounding track segments using a self-supervised convolutional neural network, achieving a perceived similarity score of 7.8/10 in user testing.
We have built a music similarity search engine that lets video producers search by listenable music excerpts, as a complement to traditional full-text search. Our system suggests similar sounding track segments in a large music catalog by training a self-supervised convolutional neural network with triplet loss terms and musical transformations. Semi-structured user interviews demonstrate that we can successfully impress professional video producers with the quality of the search experience, and perceived similarities to query tracks averaged 7.8/10 in user testing. We believe this search tool will make for a more natural search experience that is easier to find music to soundtrack videos with.