SD IR LG ASApr 15, 2023

Self-supervised Auxiliary Loss for Metric Learning in Music Similarity-based Retrieval and Auto-tagging

Taketo Akama, Hiroaki Kitano, Katsuhiro Takematsu, Yasushi Miyajima, Natalia Polouliakh

arXiv:2304.07449v12.3h-index: 6

Originality Incremental advance

AI Analysis

This work addresses scalability issues in music retrieval and tagging for applications like recommendation systems, but it is incremental as it builds on existing self-supervised learning approaches.

The paper tackled the problem of similarity-based retrieval and auto-tagging in music information retrieval by proposing a model that uses metric learning with a self-supervised auxiliary loss, resulting in enhanced performance metrics in scenarios with varying availability of human-annotated tags.

In the realm of music information retrieval, similarity-based retrieval and auto-tagging serve as essential components. Given the limitations and non-scalability of human supervision signals, it becomes crucial for models to learn from alternative sources to enhance their performance. Self-supervised learning, which exclusively relies on learning signals derived from music audio data, has demonstrated its efficacy in the context of auto-tagging. In this study, we propose a model that builds on the self-supervised learning approach to address the similarity-based retrieval challenge by introducing our method of metric learning with a self-supervised auxiliary loss. Furthermore, diverging from conventional self-supervised learning methodologies, we discovered the advantages of concurrently training the model with both self-supervision and supervision signals, without freezing pre-trained models. We also found that refraining from employing augmentation during the fine-tuning phase yields better results. Our experimental results confirm that the proposed methodology enhances retrieval and tagging performance metrics in two distinct scenarios: one where human-annotated tags are consistently available for all music tracks, and another where such tags are accessible only for a subset of tracks.

View on arXiv PDF

Similar