SD LG ASJul 21, 2022

Learning Unsupervised Hierarchies of Audio Concepts

Darius Afchar, Romain Hennequin, Vincent Guigue

arXiv:2207.11231v18.34 citationsh-index: 22Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of high-level music interpretation for music information retrieval, representing an incremental adaptation of existing methods to a new domain.

The paper tackles the problem of interpreting music signals by adapting concept learning from computer vision to music, proposing a method to learn and automatically hierarchize numerous non-independent music concepts from audio, with evaluations showing alignment with ground-truth hierarchies and proxy sources of concept similarity.

Music signals are difficult to interpret from their low-level features, perhaps even more than images: e.g. highlighting part of a spectrogram or an image is often insufficient to convey high-level ideas that are genuinely relevant to humans. In computer vision, concept learning was therein proposed to adjust explanations to the right abstraction level (e.g. detect clinical concepts from radiographs). These methods have yet to be used for MIR. In this paper, we adapt concept learning to the realm of music, with its particularities. For instance, music concepts are typically non-independent and of mixed nature (e.g. genre, instruments, mood), unlike previous work that assumed disentangled concepts. We propose a method to learn numerous music concepts from audio and then automatically hierarchise them to expose their mutual relationships. We conduct experiments on datasets of playlists from a music streaming service, serving as a few annotated examples for diverse concepts. Evaluations show that the mined hierarchies are aligned with both ground-truth hierarchies of concepts -- when available -- and with proxy sources of concept similarity in the general case.

View on arXiv PDF Code

Similar