ASAICLSDMay 21, 2025

Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey

arXiv:2505.15957v345 citationsh-index: 10
Originality Synthesis-oriented
AI Analysis

This addresses the need for structured evaluation methods in the audio-language model field, though it is incremental as it synthesizes existing work rather than introducing new models or methods.

The paper tackles the fragmented evaluation of large audio-language models by conducting a comprehensive survey and proposing a systematic taxonomy with four dimensions, offering the first such survey to provide clear guidelines for the community.

With advancements in large audio-language models (LALMs), which enhance large language models (LLMs) with auditory capabilities, these models are expected to demonstrate universal proficiency across various auditory tasks. While numerous benchmarks have emerged to assess LALMs' performance, they remain fragmented and lack a structured taxonomy. To bridge this gap, we conduct a comprehensive survey and propose a systematic taxonomy for LALM evaluations, categorizing them into four dimensions based on their objectives: (1) General Auditory Awareness and Processing, (2) Knowledge and Reasoning, (3) Dialogue-oriented Ability, and (4) Fairness, Safety, and Trustworthiness. We provide detailed overviews within each category and highlight challenges in this field, offering insights into promising future directions. To the best of our knowledge, this is the first survey specifically focused on the evaluations of LALMs, providing clear guidelines for the community. We will release the collection of the surveyed papers and actively maintain it to support ongoing advancements in the field.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes