SDCLOct 1, 2025

Hearing the Order: Investigating Selection Bias in Large Audio-Language Models

arXiv:2510.00628v14 citationsh-index: 6
Originality Synthesis-oriented
AI Analysis

This work addresses a reliability problem for users of LALMs in tasks involving ordered options, representing an incremental investigation of a known bias in a specific domain.

The paper investigates selection bias in large audio-language models (LALMs) caused by the order of answer choices, showing that shuffling options can lead to performance fluctuations of up to 24% and alter model rankings, undermining evaluation reliability. It also explores permutation-based strategies to mitigate this bias.

Large audio-language models (LALMs) are often used in tasks that involve reasoning over ordered options. An open question is whether their predictions are influenced by the order of answer choices, which would indicate a form of selection bias and undermine their reliability. In this paper, we identify and analyze this problem in LALMs. We demonstrate that no model is immune to this bias through extensive experiments on six LALMs across three widely used benchmarks and their spoken counterparts. Shuffling the order of answer options can cause performance fluctuations of up to 24% and even change model rankings, raising concerns about the reliability of current evaluation practices. We also study permutation-based strategies and show that they can mitigate bias in most cases. Our work represents the first systematic investigation of this issue in LALMs, and we hope it raises awareness and motivates further research in this direction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes