CVAIMay 27, 2025

LPOI: Listwise Preference Optimization for Vision Language Models

arXiv:2505.21061v16 citationsh-index: 4Has CodeACL
Originality Incremental advance
AI Analysis

This addresses hallucinations in vision-language models, which is a critical issue for reliable AI applications, though it is an incremental improvement over existing preference optimization methods.

The paper tackles the problem of aligning vision-language models with human preferences to reduce hallucinations, proposing LPOI which uses object-aware listwise preference optimization and achieves superior performance on benchmarks like MMHalBench, AMBER, and Object HalBench.

Aligning large VLMs with human preferences is a challenging task, as methods like RLHF and DPO often overfit to textual information or exacerbate hallucinations. Although augmenting negative image samples partially addresses these pitfalls, no prior work has employed listwise preference optimization for VLMs, due to the complexity and cost of constructing listwise image samples. In this work, we propose LPOI, the first object-aware listwise preference optimization developed for reducing hallucinations in VLMs. LPOI identifies and masks a critical object in the image, and then interpolates the masked region between the positive and negative images to form a sequence of incrementally more complete images. The model is trained to rank these images in ascending order of object visibility, effectively reducing hallucinations while retaining visual fidelity. LPOI requires no extra annotations beyond standard pairwise preference data, as it automatically constructs the ranked lists through object masking and interpolation. Comprehensive experiments on MMHalBench, AMBER, and Object HalBench confirm that LPOI outperforms existing preference optimization methods in reducing hallucinations and enhancing VLM performance. We make the code available at https://github.com/fatemehpesaran310/lpoi.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes