AI CL LGFeb 5, 2025

PerPO: Perceptual Preference Optimization via Discriminative Rewarding

Zining Zhu, Liang Zhao, Kangheng Lin, Jinze Yang, En Yu, Chenglong Liu, Haoran Wei, Jianjian Sun, Zheng Ge, Xiangyu Zhang

arXiv:2502.04371v115.610 citationsh-index: 22Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of aligning MLLMs with human visual perception for researchers and practitioners in AI, representing an incremental advancement in alignment strategies.

The paper tackles visual discrimination challenges in multimodal large language models (MLLMs) by introducing Perceptual Preference Optimization (PerPO), which uses discriminative rewarding and listwise optimization to enhance visual discrimination while maintaining generative capabilities, resulting in significant improvements in performance across visual tasks.

This paper presents Perceptual Preference Optimization (PerPO), a perception alignment method aimed at addressing the visual discrimination challenges in generative pre-trained multimodal large language models (MLLMs). To align MLLMs with human visual perception process, PerPO employs discriminative rewarding to gather diverse negative samples, followed by listwise preference optimization to rank them.By utilizing the reward as a quantitative margin for ranking, our method effectively bridges generative preference optimization and discriminative empirical risk minimization. PerPO significantly enhances MLLMs' visual discrimination capabilities while maintaining their generative strengths, mitigates image-unconditional reward hacking, and ensures consistent performance across visual tasks. This work marks a crucial step towards more perceptually aligned and versatile MLLMs. We also hope that PerPO will encourage the community to rethink MLLM alignment strategies.

View on arXiv PDF Code

Similar