CV AI CLMar 19, 2025

Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations

Shuo Li, Jiajun Sun, Guodong Zheng, Xiaoran Fan, Yujiong Shen, Yi Lu, Zhiheng Xi, Yuming Yang, Wenming Tan, Tao Ji, Tao Gui, Qi Zhang

arXiv:2503.14895v114.410 citationsh-index: 40EMNLP

Originality Incremental advance

AI Analysis

This addresses the issue of response authenticity in MLLMs for visual-language tasks, representing an incremental improvement by combining with existing methods.

The paper tackles the problem of object hallucinations in multimodal large language models (MLLMs) by introducing Multi-Frequency Perturbations (MFP), a method that reduces hallucinations by perturbing visual features based on image frequencies, achieving state-of-the-art performance on the CHAIR benchmark.

Recently, multimodal large language models (MLLMs) have demonstrated remarkable performance in visual-language tasks. However, the authenticity of the responses generated by MLLMs is often compromised by object hallucinations. We identify that a key cause of these hallucinations is the model's over-susceptibility to specific image frequency features in detecting objects. In this paper, we introduce Multi-Frequency Perturbations (MFP), a simple, cost-effective, and pluggable method that leverages both low-frequency and high-frequency features of images to perturb visual feature representations and explicitly suppress redundant frequency-domain features during inference, thereby mitigating hallucinations. Experimental results demonstrate that our method significantly mitigates object hallucinations across various model architectures. Furthermore, as a training-time method, MFP can be combined with inference-time methods to achieve state-of-the-art performance on the CHAIR benchmark.

View on arXiv PDF

Similar