CLJun 4, 2025

Robust Preference Optimization via Dynamic Target Margins

arXiv:2506.03690v210 citationsh-index: 24Has CodeACL
Originality Incremental advance
AI Analysis

This work addresses a critical bottleneck in LLM alignment for ensuring safety and reliability, offering a robust, plug-and-play solution that is incremental but effective.

The paper tackles the problem of noise in preference data for aligning Large Language Models (LLMs) by proposing γ-PO, a dynamic target margin preference optimization algorithm that adjusts reward margins at the pairwise level, achieving an average 4.4% improvement over baselines on benchmarks like AlpacaEval2 and Arena-Hard.

The alignment of Large Language Models (LLMs) is crucial for ensuring their safety and reliability in practical applications. Direct Preference Optimization (DPO) has emerged as an efficient method that directly optimizes models using preference pairs, significantly reducing resource demands. However, the effectiveness of DPO heavily depends on the data quality, which is frequently compromised by noise. In this work, we propose $γ$-PO, a dynamic target margin preference optimization algorithm that adjust reward margins at the pairwise level. By introducing instance-specific margin calibration, $γ$-PO strategically prioritizes high-confidence pairs (those demonstrating higher reward margins) while suppressing potential noise from ambiguous pairs. Moreover, $γ$-PO is a plug-and-play method, compatible with variants of DPO that rely on reward margin between preference pairs. Across benchmarks such as AlpacaEval2 and Arena-Hard, $γ$-PO achieves an average 4.4\% improvement over other baselines, setting new benchmarks for state-of-the-art performance. Additionally, $γ$-PO requires minimal code changes and has a negligible impact on training efficiency, making it a robust solution for enhancing LLMs alignment. Our codes are available at \href{https://github.com/sunjie279/gammaPO}{https://github.com/sunjie279/gammaPO}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes