SDAILGJan 20

ConceptCaps -- a Distilled Concept Dataset for Interpretability in Music Models

arXiv:2601.14157v1h-index: 2
Originality Incremental advance
AI Analysis

This addresses the problem of noisy music datasets for researchers using concept-based interpretability methods, though it is incremental as it builds on existing pipelines.

The authors tackled the lack of structured concept data for interpretability in music models by introducing ConceptCaps, a dataset of 23k music-caption-audio triplets with explicit labels from a 200-attribute taxonomy, validated through metrics like CLAP and TCAV analysis.

Concept-based interpretability methods like TCAV require clean, well-separated positive and negative examples for each concept. Existing music datasets lack this structure: tags are sparse, noisy, or ill-defined. We introduce ConceptCaps, a dataset of 23k music-caption-audio triplets with explicit labels from a 200-attribute taxonomy. Our pipeline separates semantic modeling from text generation: a VAE learns plausible attribute co-occurrence patterns, a fine-tuned LLM converts attribute lists into professional descriptions, and MusicGen synthesizes corresponding audio. This separation improves coherence and controllability over end-to-end approaches. We validate the dataset through audio-text alignment (CLAP), linguistic quality metrics (BERTScore, MAUVE), and TCAV analysis confirming that concept probes recover musically meaningful patterns. Dataset and code are available online.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes