LGCLOct 16, 2024

Preference Optimization with Multi-Sample Comparisons

arXiv:2410.12138v219 citationsh-index: 43
Originality Incremental advance
AI Analysis

This work addresses limitations in alignment methods for generative models like LLMs and diffusion models, offering incremental improvements in optimizing group characteristics such as diversity and bias.

The paper tackles the problem of generative models lacking diversity and bias control by extending post-training to use multi-sample comparisons, introducing mDPO and mIPO methods that improve diversity and robustness, with empirical results showing enhanced performance over single-sample approaches.

Recent advancements in generative models, particularly large language models (LLMs) and diffusion models, have been driven by extensive pretraining on large datasets followed by post-training. However, current post-training methods such as reinforcement learning from human feedback (RLHF) and direct alignment from preference methods (DAP) primarily utilize single-sample comparisons. These approaches often fail to capture critical characteristics such as generative diversity and bias, which are more accurately assessed through multiple samples. To address these limitations, we introduce a novel approach that extends post-training to include multi-sample comparisons. To achieve this, we propose Multi-sample Direct Preference Optimization (mDPO) and Multi-sample Identity Preference Optimization (mIPO). These methods improve traditional DAP methods by focusing on group-wise characteristics. Empirically, we demonstrate that multi-sample comparison is more effective in optimizing collective characteristics~(e.g., diversity and bias) for generative models than single-sample comparison. Additionally, our findings suggest that multi-sample comparisons provide a more robust optimization framework, particularly for dataset with label noise.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes