CVAICLMar 19, 2025

Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations

arXiv:2503.14895v110 citationsh-index: 40EMNLP
Originality Incremental advance
AI Analysis

This addresses the issue of response authenticity in MLLMs for visual-language tasks, representing an incremental improvement by combining with existing methods.

The paper tackles the problem of object hallucinations in multimodal large language models (MLLMs) by introducing Multi-Frequency Perturbations (MFP), a method that reduces hallucinations by perturbing visual features based on image frequencies, achieving state-of-the-art performance on the CHAIR benchmark.

Recently, multimodal large language models (MLLMs) have demonstrated remarkable performance in visual-language tasks. However, the authenticity of the responses generated by MLLMs is often compromised by object hallucinations. We identify that a key cause of these hallucinations is the model's over-susceptibility to specific image frequency features in detecting objects. In this paper, we introduce Multi-Frequency Perturbations (MFP), a simple, cost-effective, and pluggable method that leverages both low-frequency and high-frequency features of images to perturb visual feature representations and explicitly suppress redundant frequency-domain features during inference, thereby mitigating hallucinations. Experimental results demonstrate that our method significantly mitigates object hallucinations across various model architectures. Furthermore, as a training-time method, MFP can be combined with inference-time methods to achieve state-of-the-art performance on the CHAIR benchmark.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes