SDAILGASJun 21, 2025

CultureMERT: Continual Pre-Training for Cross-Cultural Music Representation Learning

arXiv:2506.17818v12 citationsh-index: 5ISMIR
Originality Incremental advance
AI Analysis

This addresses the problem of cultural bias in music AI for researchers and developers, though it is incremental as it builds on existing foundation models.

The paper tackles the limited effectiveness of music foundation models across diverse musical traditions by introducing CultureMERT-95M, a multi-culturally adapted model that achieves an average 4.9% improvement in ROC-AUC and AP on non-Western music auto-tagging tasks while maintaining performance on Western benchmarks.

Recent advances in music foundation models have improved audio representation learning, yet their effectiveness across diverse musical traditions remains limited. We introduce CultureMERT-95M, a multi-culturally adapted foundation model developed to enhance cross-cultural music representation learning and understanding. To achieve this, we propose a two-stage continual pre-training strategy that integrates learning rate re-warming and re-decaying, enabling stable adaptation even with limited computational resources. Training on a 650-hour multi-cultural data mix, comprising Greek, Turkish, and Indian music traditions, results in an average improvement of 4.9% in ROC-AUC and AP across diverse non-Western music auto-tagging tasks, surpassing prior state-of-the-art, with minimal forgetting on Western-centric benchmarks. We further investigate task arithmetic, an alternative approach to multi-cultural adaptation that merges single-culture adapted models in the weight space. Task arithmetic performs on par with our multi-culturally trained model on non-Western auto-tagging tasks and shows no regression on Western datasets. Cross-cultural evaluation reveals that single-culture models transfer with varying effectiveness across musical traditions, whereas the multi-culturally adapted model achieves the best overall performance. To support research on world music representation learning, we publicly release CultureMERT-95M and CultureMERT-TA-95M, fostering the development of more culturally aware music foundation models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes