CLLGOct 29, 2024

$f$-PO: Generalizing Preference Optimization with $f$-divergence Minimization

arXiv:2410.21662v216 citationsh-index: 18Has CodeAISTATS
Originality Incremental advance
AI Analysis

This work provides a unified framework for language model alignment, offering practical algorithms and theoretical insights, though it appears incremental by extending existing methods.

The paper tackles the problem of aligning language models with human preferences by introducing $f$-PO, a framework that generalizes existing preference optimization methods using $f$-divergences, achieving superior performance on benchmarks like AlpacaEval 2 and MT-Bench.

Preference optimization has made significant progress recently, with numerous methods developed to align language models with human preferences. This paper introduces $f$-divergence Preference Optimization ($f$-PO), a novel framework that generalizes and extends existing approaches. $f$-PO minimizes $f$-divergences between the optimized policy and the optimal policy, encompassing a broad family of alignment methods using various divergences. Our approach unifies previous algorithms like DPO and EXO, while offering new variants through different choices of $f$-divergences. We provide theoretical analysis of $f$-PO's properties and conduct extensive experiments on state-of-the-art language models using benchmark datasets. Results demonstrate $f$-PO's effectiveness across various tasks, achieving superior performance compared to existing methods on popular benchmarks such as AlpacaEval 2, Arena-Hard, MT-Bench, and Open LLM Leaderboard v2. Additionally, we present ablation studies exploring the impact of different $f$-divergences, offering insights into the trade-offs between regularization and performance in offline preference optimization. Our work contributes both practical algorithms and theoretical understanding to the field of language model alignment. Code is available at https://github.com/MinkaiXu/fPO.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes