EnsemHalDet: Robust VLM Hallucination Detection via Ensemble of Internal State Detectors
This addresses the issue of unreliable outputs in VLMs for users in multimodal AI applications, but it is incremental as it builds on existing internal-representation-based methods.
The paper tackled the problem of hallucination detection in Vision-Language Models by proposing EnsemHalDet, an ensemble framework that uses multiple internal representations, and it consistently outperformed prior methods in AUC across datasets and models.
Vision-Language Models (VLMs) excel at multimodal tasks, but they remain vulnerable to hallucinations that are factually incorrect or ungrounded in the input image. Recent work suggests that hallucination detection using internal representations is more efficient and accurate than approaches that rely solely on model outputs. However, existing internal-representation-based methods typically rely on a single representation or detector, limiting their ability to capture diverse hallucination signals. In this paper, we propose EnsemHalDet, an ensemble-based hallucination detection framework that leverages multiple internal representations of VLMs, including attention outputs and hidden states. EnsemHalDet trains independent detectors for each representation and combines them through ensemble learning. Experimental results across multiple VQA datasets and VLMs show that EnsemHalDet consistently outperforms prior methods and single-detector models in terms of AUC. These results demonstrate that ensembling diverse internal signals significantly improves robustness in multimodal hallucination detection.