CVAIMar 1, 2025

Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding

arXiv:2503.00361v127 citationsh-index: 10Has CodeCVPR
Originality Highly original
AI Analysis

This work addresses the critical issue of hallucination in vision-language models, which can generate fabricated responses, and offers a deployable solution with broad applicability.

The paper tackles the hallucination problem in Large Vision-Language Models by introducing Octopus, a dynamic contrastive decoding framework that adaptively identifies hallucination types and creates a tailored workflow, achieving state-of-the-art performance across four benchmarks.

Large Vision-Language Models (LVLMs) have obtained impressive performance in visual content understanding and multi-modal reasoning. Unfortunately, these large models suffer from serious hallucination problems and tend to generate fabricated responses. Recently, several Contrastive Decoding (CD) strategies have been proposed to alleviate hallucination by introducing disturbed inputs. Although great progress has been made, these CD strategies mostly apply a one-size-fits-all approach for all input conditions. In this paper, we revisit this process through extensive experiments. Related results show that hallucination causes are hybrid and each generative step faces a unique hallucination challenge. Leveraging these meaningful insights, we introduce a simple yet effective Octopus-like framework that enables the model to adaptively identify hallucination types and create a dynamic CD workflow. Our Octopus framework not only outperforms existing methods across four benchmarks but also demonstrates excellent deployability and expansibility. Code is available at https://github.com/LijunZhang01/Octopus.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes