LGFeb 6

Displacement-Resistant Extensions of DPO with Nonconvex $f$-Divergences

arXiv:2602.06788v1h-index: 27
Originality Incremental advance
AI Analysis

This work addresses alignment issues in language models for AI applications, presenting an incremental improvement over existing methods.

The paper tackles the problem of aligning language models by generalizing the DPO algorithm to use nonconvex f-divergences, identifying conditions for tractability and displacement resistance, and introduces SquaredPO, which offers stronger theoretical guarantees and competitive performance.

DPO and related algorithms align language models by directly optimizing the RLHF objective: find a policy that maximizes the Bradley-Terry reward while staying close to a reference policy through a KL divergence penalty. Previous work showed that this approach could be further generalized: the original problem remains tractable even if the KL divergence is replaced by a family of $f$-divergence with a convex generating function $f$. Our first contribution is to show that convexity of $f$ is not essential. Instead, we identify a more general condition, referred to as DPO-inducing, that precisely characterizes when the RLHF problem remains tractable. Our next contribution is to establish a second condition on $f$ that is necessary to prevent probability displacement, a known empirical phenomenon in which the probabilities of the winner and the loser responses approach zero. We refer to any $f$ that satisfies this condition as displacement-resistant. We finally focus on a specific DPO-inducing and displacement-resistant $f$, leading to our novel SquaredPO loss. Compared to DPO, this new loss offers stronger theoretical guarantees while performing competitively in practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes