ITAIMay 6, 2025

Soft Best-of-n Sampling for Model Alignment

arXiv:2505.03156v119 citationsh-index: 31ISIT
Originality Incremental advance
AI Analysis

This work addresses a practical challenge in model alignment for language models, offering a more controlled method to balance reward and distortion, though it is incremental as it builds on existing Best-of-n sampling techniques.

The paper tackles the distortion problem in Best-of-n sampling for aligning language models with human preferences by introducing Soft Best-of-n sampling, which uses a temperature parameter to smoothly interpolate between the original and reward-maximizing distributions, achieving convergence to the optimal distribution at a rate of O(1/n) in KL divergence and expected reward.

Best-of-$n$ (BoN) sampling is a practical approach for aligning language model outputs with human preferences without expensive fine-tuning. BoN sampling is performed by generating $n$ responses to a prompt and then selecting the sample that maximizes a reward function. BoN yields high reward values in practice at a distortion cost, as measured by the KL-divergence between the sampled and original distribution. This distortion is coarsely controlled by varying the number of samples: larger $n$ yields a higher reward at a higher distortion cost. We introduce Soft Best-of-$n$ sampling, a generalization of BoN that allows for smooth interpolation between the original distribution and reward-maximizing distribution through a temperature parameter $λ$. We establish theoretical guarantees showing that Soft Best-of-$n$ sampling converges sharply to the optimal tilted distribution at a rate of $O(1/n)$ in KL and the expected (relative) reward. For sequences of discrete outputs, we analyze an additive reward model that reveals the fundamental limitations of blockwise sampling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes