CVMay 24, 2024

Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization

Xinyu Lyu, Beitao Chen, Lianli Gao, Jingkuan Song, Heng Tao Shen

arXiv:2405.15356v323.346 citationsh-index: 47Has CodeNIPS

Originality Incremental advance

AI Analysis

This addresses hallucinations in vision-language models, which is an incremental improvement over existing contrastive decoding methods.

The paper tackles the problem of hallucinations in Large Vision-Language Models by proposing Hallucination-Induced Optimization, which reduces hallucinations and outperforms state-of-the-art methods on various benchmarks.

Although Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data, they invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images. Almost all current visual contrastive decoding methods attempt to mitigate these hallucinations by introducing visual uncertainty information that appropriately widens the contrastive logits gap between hallucinatory and targeted ones. However, due to uncontrollable nature of the global visual uncertainty, they struggle to precisely induce the hallucinatory tokens, which severely limits their effectiveness in mitigating hallucinations and may even lead to the generation of undesired hallucinations. To tackle this issue, we conducted the theoretical analysis to promote the effectiveness of contrast decoding. Building on this insight, we introduce a novel optimization strategy named Hallucination-Induced Optimization (HIO). This strategy seeks to amplify the contrast between hallucinatory and targeted tokens relying on a fine-tuned theoretical preference model (i.e., Contrary Bradley-Terry Model), thereby facilitating efficient contrast decoding to alleviate hallucinations in LVLMs. Extensive experimental research demonstrates that our HIO strategy can effectively reduce hallucinations in LVLMs, outperforming state-of-the-art methods across various benchmarks.

View on arXiv PDF Code

Similar