CLJan 7

PALM-Bench: A Comprehensive Benchmark for Personalized Audio-Language Models

Yuwen Wang, Xinyuan Qian, Tian-Hao Zhang, Jiaran Gao, Yuchen Pan, Xin Wang, Zhou Pan, Chen Wei, Yiming Wang

arXiv:2601.03531v11 citationsh-index: 3Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the need for personalized audio-language understanding in multi-speaker scenarios, but it is incremental as it builds on existing LALMs with a new benchmark and task formalization.

The paper tackles the problem that large audio-language models (LALMs) perform generically and fail to support personalized question answering, such as summarizing content from specific individuals, by formalizing the task of Personalized LALMs (PALM) and creating the PALM-Bench benchmark. The result shows that existing training-free prompting and supervised fine-tuning strategies yield improvements but remain limited in modeling personalized knowledge and transferring it robustly across tasks.

Large Audio-Language Models (LALMs) have demonstrated strong performance in audio understanding and generation. Yet, our extensive benchmarking reveals that their behavior is largely generic (e.g., summarizing spoken content) and fails to adequately support personalized question answering (e.g., summarizing what my best friend says). In contrast, human conditions their interpretation and decision-making on each individual's personal context. To bridge this gap, we formalize the task of Personalized LALMs (PALM) for recognizing personal concepts and reasoning within personal context. Moreover, we create the first benchmark (PALM-Bench) to foster the methodological advances in PALM and enable structured evaluation on several tasks across multi-speaker scenarios. Our extensive experiments on representative open-source LALMs, show that existing training-free prompting and supervised fine-tuning strategies, while yield improvements, remains limited in modeling personalized knowledge and transferring them across tasks robustly. Data and code will be released.

View on arXiv PDF

Similar