AICLLGFeb 5, 2025

PerPO: Perceptual Preference Optimization via Discriminative Rewarding

arXiv:2502.04371v18 citationsh-index: 22
Originality Incremental advance
AI Analysis

This work addresses the problem of aligning MLLMs with human visual perception for researchers and practitioners in AI, representing an incremental advancement in alignment strategies.

The paper tackles visual discrimination challenges in multimodal large language models (MLLMs) by introducing Perceptual Preference Optimization (PerPO), which uses discriminative rewarding and listwise optimization to enhance visual discrimination while maintaining generative capabilities, resulting in significant improvements in performance across visual tasks.

This paper presents Perceptual Preference Optimization (PerPO), a perception alignment method aimed at addressing the visual discrimination challenges in generative pre-trained multimodal large language models (MLLMs). To align MLLMs with human visual perception process, PerPO employs discriminative rewarding to gather diverse negative samples, followed by listwise preference optimization to rank them.By utilizing the reward as a quantitative margin for ranking, our method effectively bridges generative preference optimization and discriminative empirical risk minimization. PerPO significantly enhances MLLMs' visual discrimination capabilities while maintaining their generative strengths, mitigates image-unconditional reward hacking, and ensures consistent performance across visual tasks. This work marks a crucial step towards more perceptually aligned and versatile MLLMs. We also hope that PerPO will encourage the community to rethink MLLM alignment strategies.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes