CV AIAug 14, 2025

MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs

Haonan Ge, Yiwei Wang, Ming-Hsuan Yang, Yujun Cai

arXiv:2508.10264v29 citationsh-index: 19EMNLP

Originality Incremental advance

AI Analysis

This addresses hallucinations in LVLMs for multimodal tasks, but it is incremental as it builds on existing decoding techniques.

The paper tackles the problem of hallucinations in Large Vision-Language Models (LVLMs) by proposing MRFD, a training-free decoding method that reduces hallucinations and improves factuality, as shown in experiments across multiple models and benchmarks.

Large Vision-Language Models (LVLMs) have shown strong performance across multimodal tasks. However, they often produce hallucinations -- text that is inconsistent with visual input, due to the limited ability to verify information in different regions of the image. To address this, we propose Multi-Region Fusion Decoding (MRFD), a training-free decoding method that improves factual grounding by modeling inter-region consistency. MRFD identifies salient regions using cross-attention, generates initial responses for each, and computes reliability weights based on Jensen-Shannon Divergence (JSD) among the responses. These weights guide a consistency-aware fusion of per-region predictions, using region-aware prompts inspired by Chain-of-Thought reasoning. Experiments across multiple LVLMs and benchmarks show that MRFD significantly reduces hallucinations and improves response factuality without requiring model updates.

View on arXiv PDF

Similar