SDLGASMay 9, 2025

Learning Music Audio Representations With Limited Data

arXiv:2505.06042v1h-index: 42Has CodeICASSP
Originality Synthesis-oriented
AI Analysis

This addresses challenges in music AI for underrepresented genres and personalized applications where data is scarce, though it is incremental in analyzing existing models.

The study investigated how music audio representation models perform with limited training data, finding that under certain conditions, limited-data and random models can match large-dataset models, though handcrafted features sometimes outperform learned ones.

Large deep-learning models for music, including those focused on learning general-purpose music audio representations, are often assumed to require substantial training data to achieve high performance. If true, this would pose challenges in scenarios where audio data or annotations are scarce, such as for underrepresented music traditions, non-popular genres, and personalized music creation and listening. Understanding how these models behave in limited-data scenarios could be crucial for developing techniques to tackle them. In this work, we investigate the behavior of several music audio representation models under limited-data learning regimes. We consider music models with various architectures, training paradigms, and input durations, and train them on data collections ranging from 5 to 8,000 minutes long. We evaluate the learned representations on various music information retrieval tasks and analyze their robustness to noise. We show that, under certain conditions, representations from limited-data and even random models perform comparably to ones from large-dataset models, though handcrafted features outperform all learned representations in some tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes