CL AI LGFeb 25, 2025

Scalable Best-of-N Selection for Large Language Models via Self-Certainty

Berkeley

arXiv:2502.18581v241.3160 citationsh-index: 24Has Code

Originality Incremental advance

AI Analysis

This addresses the need for more efficient and scalable methods to enhance LLM reasoning without relying on computationally intensive reward models, though it is incremental as it builds on existing selection techniques.

The paper tackles the problem of improving Large Language Models' reasoning performance via Best-of-N selection by proposing self-certainty, a reward-free metric that uses the LLM's probability distribution to estimate response quality, resulting in scalable and efficient performance gains across various reasoning tasks.

Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models (LLMs) through increased test-time computation. Current state-of-the-art methods often employ computationally intensive reward models for response evaluation and selection. Reward-free alternatives, like self-consistency and universal self-consistency, are limited in their ability to handle open-ended generation tasks or scale effectively. To address these limitations, we propose self-certainty, a novel and efficient metric that leverages the inherent probability distribution of LLM outputs to estimate response quality without requiring external reward models. We hypothesize that higher distributional self-certainty, aggregated across multiple samples, correlates with improved response accuracy, as it reflects greater confidence in the generated output. Through extensive experiments on various reasoning tasks, we demonstrate that self-certainty (1) scales effectively with increasing sample size N, akin to reward models but without the computational overhead; (2) complements chain-of-thought, improving reasoning performance beyond greedy decoding; and (3) generalizes to open-ended tasks where traditional self-consistency methods fall short. Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities. The code is available at https://github.com/backprop07/Self-Certainty

View on arXiv PDF Code

Similar