CLNov 28, 2025

Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization

Jian Li, Shenglin Yin, Yujia Zhang, Alan Zhao, Xi Chen, Xiaohui Zhou, Pengfei Xu

arXiv:2511.23391v16.72 citations

Originality Incremental advance

AI Analysis

This addresses a specific bottleneck in RLHF methods for improving alignment in language models, representing an incremental advancement.

The paper tackles the problem of ambiguous content degrading performance in Direct Preference Optimization (DPO) by introducing Ambiguity Awareness Optimization (AAO), which re-weights such content based on semantic similarity, resulting in improvements of up to 8.9 points on AlpacaEval 2 and 15.0 points on Arena-Hard.

Direct Preference Optimization (DPO) is a widely used reinforcement learning from human feedback (RLHF) method across various domains. Recent research has increasingly focused on the role of token importance in improving DPO effectiveness. It is observed that identical or semantically similar content (defined as ambiguous content) frequently appears within the preference pairs. We hypothesize that the presence of ambiguous content during DPO training may introduce ambiguity, thereby limiting further improvements in alignment. Through mathematical analysis and proof-of-concept experiments, we reveal that ambiguous content may potentially introduce ambiguities, thereby degrading performance. To address this issue, we introduce Ambiguity Awareness Optimization (AAO), a simple yet effective approach that automatically re-weights ambiguous content to reduce ambiguities by calculating semantic similarity from preference pairs. Through extensive experiments, we demonstrate that AAO consistently and significantly surpasses state-of-the-art approaches in performance, without markedly increasing response length, across multiple model scales and widely adopted benchmark datasets, including AlpacaEval 2, MT-Bench, and Arena-Hard. Specifically, AAO outperforms DPO by up to 8.9 points on AlpacaEval 2 and achieves an improvement of by up to 15.0 points on Arena-Hard.

View on arXiv PDF

Similar