Suppressing VLM Hallucinations with Spectral Representation Filtering
This addresses the issue of unreliable outputs in VLMs for applications requiring accurate image descriptions, though it is an incremental improvement as it builds on existing methods without a paradigm shift.
The paper tackled the problem of hallucinations in vision-language models by introducing Spectral Representation Filtering, a training-free method that reduces hallucination rates across multiple VLM families on benchmarks like MSCOCO and POPE-VQA, achieving state-of-the-art faithfulness without degrading caption quality.
Vision-language models (VLMs) frequently produce hallucinations in the form of descriptions of objects, attributes, or relations that do not exist in the image due to over-reliance on language priors and imprecise cross-modal grounding. We introduce Spectral Representation Filtering (SRF), a lightweight, training-free method to suppress such hallucinations by analyzing and correcting the covariance structure of the model's representations. SRF identifies low-rank hallucination modes through eigendecomposition of the covariance of the differences between features collected for truthful and hallucinatory captions, revealing structured biases in the feature space. A soft spectral filter then attenuates these modes in the feed-forward projection weights of deeper vLLM layers, equalizing feature variance while preserving semantic fidelity. Unlike decoding or retraining-based approaches, SRF operates entirely post-hoc, incurs zero inference overhead, and requires no architectural modifications. Across three families of VLMs (LLaVA-1.5, MiniGPT-4, and mPLUG-Owl2), SRF consistently reduces hallucination rates on MSCOCO, POPE-VQA, and other visual tasks benchmarks, achieving state-of-the-art faithfulness without degrading caption quality.