Control the Temperature: Selective Sampling for Diverse and High-Quality LLM Outputs
This addresses the challenge of balancing diversity and accuracy in LLM outputs for tasks requiring precision, though it is incremental as it builds on existing sampling methods.
The paper tackles the problem of maintaining high precision in language model outputs, such as in mathematical reasoning, where uncontrolled high-temperature sampling degrades quality, by proposing selective sampling to dynamically switch between greedy and high-temperature sampling based on a risk metric, resulting in enhanced quality-diversity trade-offs in experiments.
Diversity is an essential metric for evaluating the creativity of outputs generated by language models. Temperature-based sampling is a common strategy to increase diversity. However, for tasks that require high precision, e.g., mathematical reasoning, uncontrolled high temperature sampling, e.g., min-$p$ or top-$p$, degrades reasoning quality. We demonstrate that the loss of accuracy is caused by sampling incorrect continuations in sensitive decoding positions. To address this, in this paper, we propose \textbf{selective sampling}, a method that dynamically switches between greedy and high-temperature sampling based on a sampling risk metric. This risk metric estimates the likelihood of output errors when applying high-temperature sampling on the current token position. To predict sampling risk, we train a lightweight classifier on a small subset of verifiable problems. The trained classifier can be integrated with the base language model with minimal latency overhead. Experiments on mathematical reasoning tasks demonstrate that selective sampling enhances the quality-diversity trade-off, even in high-temperature settings.