CV AINov 27, 2024

DHCP: Detecting Hallucinations by Cross-modal Attention Pattern in Large Vision-Language Models

Yudong Zhang, Ruobing Xie, Xingwu Sun, Yiqing Huang, Jiansheng Chen, Zhanhui Kang, Di Wang, Yu Wang

Tsinghua

arXiv:2411.18659v212.88 citationsh-index: 11Has CodeMM

Originality Incremental advance

AI Analysis

This addresses the reliability and trustworthiness of large vision-language models, which is crucial for their safe deployment, though it is incremental as it builds on existing attention mechanisms.

The paper tackles the problem of hallucination in large vision-language models by developing a lightweight detector that identifies hallucinations based on cross-modal attention pattern variations, achieving remarkable performance without requiring additional training or inference steps.

Large vision-language models (LVLMs) have demonstrated exceptional performance on complex multimodal tasks. However, they continue to suffer from significant hallucination issues, including object, attribute, and relational hallucinations. To accurately detect these hallucinations, we investigated the variations in cross-modal attention patterns between hallucination and non-hallucination states. Leveraging these distinctions, we developed a lightweight detector capable of identifying hallucinations. Our proposed method, Detecting Hallucinations by Cross-modal Attention Patterns (DHCP), is straightforward and does not require additional LVLM training or extra LVLM inference steps. Experimental results show that DHCP achieves remarkable performance in hallucination detection. By offering novel insights into the identification and analysis of hallucinations in LVLMs, DHCP contributes to advancing the reliability and trustworthiness of these models. The code is available at https://github.com/btzyd/DHCP.

View on arXiv PDF Code

Similar