LGFeb 6

Displacement-Resistant Extensions of DPO with Nonconvex $f$-Divergences

Idan Pipano, Shoham Sabach, Kavosh Asadi, Mohammad Ghavamzadeh

arXiv:2602.06788v11.4h-index: 27

Originality Incremental advance

AI Analysis

This work addresses alignment issues in language models for AI applications, presenting an incremental improvement over existing methods.

The paper tackles the problem of aligning language models by generalizing the DPO algorithm to use nonconvex f-divergences, identifying conditions for tractability and displacement resistance, and introduces SquaredPO, which offers stronger theoretical guarantees and competitive performance.

DPO and related algorithms align language models by directly optimizing the RLHF objective: find a policy that maximizes the Bradley-Terry reward while staying close to a reference policy through a KL divergence penalty. Previous work showed that this approach could be further generalized: the original problem remains tractable even if the KL divergence is replaced by a family of $f$-divergence with a convex generating function $f$. Our first contribution is to show that convexity of $f$ is not essential. Instead, we identify a more general condition, referred to as DPO-inducing, that precisely characterizes when the RLHF problem remains tractable. Our next contribution is to establish a second condition on $f$ that is necessary to prevent probability displacement, a known empirical phenomenon in which the probabilities of the winner and the loser responses approach zero. We refer to any $f$ that satisfies this condition as displacement-resistant. We finally focus on a specific DPO-inducing and displacement-resistant $f$, leading to our novel SquaredPO loss. Compared to DPO, this new loss offers stronger theoretical guarantees while performing competitively in practice.

View on arXiv PDF

Similar