CL AIFeb 15, 2025

Self-supervised Attribute-aware Dynamic Preference Ranking Alignment

Hongyu Yang, Qi Zhao, Zhenhua hu, Rui Li

arXiv:2502.12189v12.7h-index: 1

Originality Incremental advance

AI Analysis

This addresses the challenge of scalable and consistent preference alignment for AI systems in multi-response settings, though it is incremental as it builds on existing alignment methods.

The paper tackles the problem of aligning AI responses with human preferences in list-level scenarios like community Q&A without costly human annotations, by proposing a self-supervised method that quantifies preference differences and dynamically orders alignment, achieving superior performance on a new dataset and eight domains.

Reinforcement Learning from Human Feedback and its variants excel in aligning with human intentions to generate helpful, harmless, and honest responses. However, most of them rely on costly human-annotated pairwise comparisons for supervised alignment, which is not suitable for list-level scenarios, such as community question answering. Additionally, human preferences are influenced by multiple intrinsic factors in responses, leading to decision-making inconsistencies. Therefore, we propose \textbf{Se}lf-supervised \textbf{A}ttribute-aware \textbf{d}ynamic \textbf{p}reference \textbf{ra}nking, called \shortname. \ It quantifies preference differences between responses based on Attribute-Perceptual Distance Factors (APDF) and dynamically determines the list-wise alignment order. Furthermore, it achieves fine-grained preference difference learning and enables precise alignment with the optimal one. We specifically constructed a challenging code preference dataset named StaCoCoQA, and introduced more cost-effective and scalable preference evaluation metrics: PrefHit and PrefRecall. Extensive experimental results show that SeAdpra exhibits superior performance and generalizability on both StaCoCoQA and preference datasets from eight popular domains.

View on arXiv PDF

Similar