MLLGMay 22, 2025

Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation

arXiv:2505.17288v1h-index: 7
Originality Synthesis-oriented
AI Analysis

This work provides theoretical insights for researchers and practitioners in machine learning on choosing adaptation methods for language models, though it is incremental as it builds on existing methods in a specific case study.

The paper theoretically compares supervised fine-tuning and Best-of-N methods for adapting large language models to bit string generation, finding that supervised fine-tuning outperforms in realizable settings with better convergence rates dependent on response length, while Best-of-N can have advantages in non-realizable scenarios.

Using the bit string generation problem as a case study, we theoretically compare two standard methods for adapting large language models to new tasks. The first, referred to as supervised fine-tuning, involves training a new next token predictor on good generations. The second method, Best-of-N, trains a reward model to select good responses from a collection generated by an unaltered base model. If the learning setting is realizable, we find that supervised fine-tuning outperforms BoN through a better dependence on the response length in its rate of convergence. If realizability fails, then depending on the failure mode, BoN can enjoy a better rate of convergence in either n or a rate of convergence with better dependence on the response length.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes