SDLGASOct 14, 2025

Adaptive vector steering: A training-free, layer-wise intervention for hallucination mitigation in large audio and multimodal models

arXiv:2510.12851v13 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses hallucination issues in audio and multimodal AI models, offering a practical solution for improving reliability in applications like audio question answering.

The paper tackled hallucination in large audio and multimodal models by proposing Adaptive Vector Steering (AVS), a training-free intervention that improved performance on benchmarks, such as boosting Gemma's F1-score from 0.550 to 0.619 on the Audio Hallucination QA dataset.

Large Audio-Language Models and Multi-Modal Large Language Models have demonstrated strong capabilities in tasks such as Audio Question Answering (AQA), Audio Captioning, and Automatic Speech Recognition (ASR). However, there is growing evidence that these models can hallucinate about the content of the audio. To address this issue, we probe the models' internal states and propose Adaptive Vector Steering (AVS), a method that better grounds generation in audio content. We also identify a strong correlation between output correctness and internal representations. Experiments show consistent performance gains across two models and two benchmarks. On the Audio Hallucination QA dataset, our method boosts the F1-score of Gemma from 0.550 to 0.619 and Qwen from 0.626 to 0.632. Furthermore, our method increases the accuracy of Qwen on MMAU from 0.548 to 0.592, marking an 8% relative increase. To the best of our knowledge, this is the first work to apply vector steering to mitigate hallucination in audio.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes