SDCLASOct 17, 2024

CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models

arXiv:2410.13267v217 citationsh-index: 16NAACL
Originality Incremental advance
AI Analysis

This addresses the need for inclusive and global music information retrieval systems, though it appears incremental as an extension of prior work.

The paper tackles the problem of linguistic diversity and multimodal integration in music information retrieval by introducing CLaMP 2, a system that supports 101 languages and both ABC notation and MIDI, achieving state-of-the-art results in multilingual semantic search and music classification.

Challenges in managing linguistic diversity and integrating various musical modalities are faced by current music information retrieval systems. These limitations reduce their effectiveness in a global, multimodal music environment. To address these issues, we introduce CLaMP 2, a system compatible with 101 languages that supports both ABC notation (a text-based musical notation format) and MIDI (Musical Instrument Digital Interface) for music information retrieval. CLaMP 2, pre-trained on 1.5 million ABC-MIDI-text triplets, includes a multilingual text encoder and a multimodal music encoder aligned via contrastive learning. By leveraging large language models, we obtain refined and consistent multilingual descriptions at scale, significantly reducing textual noise and balancing language distribution. Our experiments show that CLaMP 2 achieves state-of-the-art results in both multilingual semantic search and music classification across modalities, thus establishing a new standard for inclusive and global music information retrieval.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes