AIJun 13, 2025

Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization

arXiv:2506.11712v210 citationsh-index: 25
Originality Incremental advance
AI Analysis

This addresses hallucination issues in MLLMs for improved reliability, but it is incremental as it builds on existing Direct Preference Optimization methods.

The paper tackled hallucination in Multimodal Large Language Models (MLLMs) by proposing Symmetric Multimodal Preference Optimization (SymMPO), which achieved superior performance across five benchmarks in reducing hallucination.

Direct Preference Optimization (DPO) has emerged as an effective approach for mitigating hallucination in Multimodal Large Language Models (MLLMs). Although existing methods have achieved significant progress by utilizing vision-oriented contrastive objectives for enhancing MLLMs' attention to visual inputs and hence reducing hallucination, they suffer from non-rigorous optimization objective function and indirect preference supervision. To address these limitations, we propose a Symmetric Multimodal Preference Optimization (SymMPO), which conducts symmetric preference learning with direct preference supervision (i.e., response pairs) for visual understanding enhancement, while maintaining rigorous theoretical alignment with standard DPO. In addition to conventional ordinal preference learning, SymMPO introduces a preference margin consistency loss to quantitatively regulate the preference gap between symmetric preference pairs. Comprehensive evaluation across five benchmarks demonstrate SymMPO's superior performance, validating its effectiveness in hallucination mitigation of MLLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes