LGAIJun 8, 2025

AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models

arXiv:2506.07165v17 citationsh-index: 6Has CodeACL
Originality Incremental advance
AI Analysis

This addresses limitations in aligning LLMs with diverse preferences for AI applications, though it appears incremental as it builds on existing optimization paradigms.

The paper tackles the problem of multi-objective preference alignment in large language models by proposing AMoPO, which dynamically balances preference dimensions without auxiliary models, resulting in a 28.5% improvement over state-of-the-art baselines.

Existing multi-objective preference alignment methods for large language models (LLMs) face limitations: (1) the inability to effectively balance various preference dimensions, and (2) reliance on auxiliary reward/reference models introduces computational complexity. To address these challenges, we propose Adaptive Multi-objective Preference Optimization (AMoPO), a novel framework that achieves dynamic balance across preference dimensions. By introducing the multi-objective optimization paradigm to use the dimension-aware generation metrics as implicit rewards, AMoPO aligns LLMs with diverse preferences without additional reward models or reference models. We introduce an adaptive weight assignment mechanism that models the generation space as a Gaussian distribution, allowing dynamic prioritization of preference dimensions. Empirical results demonstrate that AMoPO outperforms state-of-the-art baselines by 28.5%, and the experiments on 7B, 14B, and 32B models reveal the scaling ability of AMoPO. Moreover, additional analysis of multiple dimensions verifies its adaptability and effectiveness. These findings validate AMoPO's capability to achieve dimension-aware preference alignment, highlighting its superiority. Our codes and datasets are available at https://github.com/Javkonline/AMoPO.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes