LGOct 27, 2025

Lightweight Robust Direct Preference Optimization

Cheol Woo Kim, Shresth Verma, Mauricio Tec, Milind Tambe

arXiv:2510.23590v12 citationsh-index: 8

Originality Incremental advance

AI Analysis

This addresses noise sensitivity in DPO for fine-tuning large language models, but it is incremental as it builds on prior DRO-based methods with a lightweight focus.

The paper tackles the problem of Direct Preference Optimization (DPO) being sensitive to noise and overfitting by proposing DPO-PRO, a robust fine-tuning algorithm that improves robustness to noisy preference signals compared to existing DPO variants.

Direct Preference Optimization (DPO) has become a popular method for fine-tuning large language models (LLMs) due to its stability and simplicity. However, it is also known to be sensitive to noise in the data and prone to overfitting. Recent works have proposed using distributionally robust optimization (DRO) to address potential noise and distributional shift in the data. However, these methods often suffer from excessive conservatism and high computational cost. We propose DPO-PRO (DPO with Preference Robustness), a robust fine-tuning algorithm based on DPO which accounts for uncertainty in the preference distribution through a lightweight DRO formulation. Unlike prior DRO-based variants, DPO-PRO focuses solely on uncertainty in preferences, avoiding unnecessary conservatism and incurring negligible computational overhead. We further show that DPO-PRO is equivalent to a regularized DPO objective that penalizes model overconfidence under weak preference signals. We evaluate DPO-PRO on standard alignment benchmarks and a real-world public health task. Experimental results show that our method consistently improves robustness to noisy preference signals compared to existing DPO variants.

View on arXiv PDF

Similar