AIApr 9

DialBGM: A Benchmark for Background Music Recommendation from Everyday Multi-Turn Dialogues

Joonhyeok Shin, Jaehoon Kang, Yujun Lee, Hannah Lee, Yejin Lee, Yoonji Park, Kyuhong Shim

arXiv:2604.0789581.1h-index: 1Has Code

Predicted impact top 24% in AI · last 90 daysOriginality Synthesis-oriented

AI Analysis

This addresses the need for automated background music recommendation in media and interactive systems, though it is incremental as it focuses on benchmarking rather than a novel solution.

The paper tackles the problem of selecting appropriate background music for multi-turn dialogues without explicit music descriptors, introducing the DialBGM benchmark with 1,200 dialogues and human-annotated music clips, and finds that current models perform poorly, with none exceeding 35% Hit@1 in selecting the top-ranked clip.

Selecting an appropriate background music (BGM) that supports natural human conversation is a common production step in media and interactive systems. In this paper, we introduce dialogue-conditioned BGM recommendation, where a model should select non-intrusive, fitting music for a multi-turn conversation that often contains no music descriptors. To study this novel problem, we present DialBGM, a benchmark of 1,200 open-domain daily dialogues, each paired with four candidate music clips and annotated with human preference rankings. Rankings are determined by background suitability criteria, including contextual relevance, non-intrusiveness, and consistency. We evaluate a wide range of open-source and proprietary models, including audio-language models and multimodal LLMs, and show that current models fall far short of human judgments; no model exceeds 35% Hit@1 when selecting the top-ranked clip. DialBGM provides a standardized benchmark for developing discourse-aware methods for BGM selection and for evaluating both retrieval-based and generative models.

View on arXiv PDF

Similar