LG CLFeb 20, 2025

Federated Fine-Tuning of Large Language Models: Kahneman-Tversky vs. Direct Preference Optimization

arXiv:2502.14187v114.46 citationsh-index: 17WWW

Originality Incremental advance

AI Analysis

This work addresses the need for robust and scalable fine-tuning methods in privacy-preserving, decentralized federated learning environments, though it appears incremental as it compares existing methods in a specific setup.

The paper tackled the problem of fine-tuning large language models in federated learning settings by comparing Kahneman-Tversky Optimization (KTO) against Direct Preference Optimization (DPO), finding that KTO consistently outperformed DPO across benchmarks like MT-Bench-1, Vicuna, and AdvBench, with KTO also showing flexibility in redistributed datasets where DPO was inapplicable.

We evaluate Kahneman-Tversky Optimization (KTO) as a fine-tuning method for large language models (LLMs) in federated learning (FL) settings, comparing it against Direct Preference Optimization (DPO). Using Alpaca-7B as the base model, we fine-tune on a realistic dataset under both methods and evaluate performance using MT-Bench-1, Vicuna, and AdvBench benchmarks. Additionally, we introduce a redistributed dataset setup, where only KTO is applicable due to its ability to handle single-response feedback, unlike DPO's reliance on paired responses. Our results demonstrate that KTO, in both its original (KTOO) and redistributed (KTOR) configurations, consistently outperforms DPO across all benchmarks. In the redistributed setup, KTO further validates its flexibility and resilience by maintaining superior performance in scenarios where DPO cannot be applied. These findings establish KTO as a robust and scalable fine-tuning method for FL, motivating its adoption for privacy-preserving, decentralized, and heterogeneous environments.

View on arXiv PDF

Similar