SDMar 31

Audio Hallucination Attacks: Probing the Reliability of Large Audio Language Models

Ashish Seth, Sonal Kumar, Ramaneswaran Selvakumar, Nishit Anand, Utkarsh Tyagi, Prem Seetharaman, Ramani Duraiswami, Dinesh Manocha

arXiv:2603.2926398.3h-index: 57

AI Analysis

This addresses a critical reliability gap for users of audio-language AI systems, exposing vulnerabilities that standard benchmarks miss, though it is incremental as it builds on existing attack and mitigation frameworks.

The paper tackles the reliability of Large Audio Language Models (LALMs) in real-world settings by introducing Audio Hallucination Attacks (AHA), which reveal high attack success rates of 95.35% and 79.65% on state-of-the-art models, and proposes a mitigation dataset that reduces these rates by up to 49%.

Large Audio Language Models (LALMs) achieve strong performance on audio-language tasks; however, their reliability in real-world settings remains underexplored. We introduce Audio Hallucination Attacks (AHA), an attack suite called AHA-Eval, comprising 6.5K QA pairs designed to test whether LALMs genuinely ground their responses in the audio input. AHA targets two attack surfaces: (i) query-based attacks, which exploit question structure to induce hallucinations about absent sounds, and (ii) audio-based attacks, which inject synthetic speech describing non-existent events into the audio stream. Evaluating state-of-the-art LALMs, including Audio Flamingo 3 and Gemini 3 Pro, we observe high attack success rates of 95.35% and 79.65%, respectively, revealing a reliability gap that is hidden by standard benchmark performance. To mitigate this, we propose a 120K QA post-alignment dataset, AHA-Guard, which successfully reduces attack success rates by up to 49%.

View on arXiv PDF

Similar