Flaming-hot Initiation with Regular Execution Sampling for Large Language Models
This addresses a key bottleneck in developing general capabilities for LLMs, particularly for reasoning tasks with sandbox checkers, but appears incremental as it builds on existing sampling methods.
The paper tackles the challenge of efficiently sourcing diverse, high-quality data for large language models in reasoning tasks like math or code, and introduces FIRE sampling, which enhances inference-time generation quality and benefits training in alignment.
Since the release of ChatGPT, large language models (LLMs) have demonstrated remarkable capabilities across various domains. A key challenge in developing these general capabilities is efficiently sourcing diverse, high-quality data. This becomes especially critical in reasoning-related tasks with sandbox checkers, such as math or code, where the goal is to generate correct solutions to specific problems with higher probability. In this work, we introduce Flaming-hot Initiation with Regular Execution (FIRE) sampling, a simple yet highly effective method to efficiently find good responses. Our empirical findings show that FIRE sampling enhances inference-time generation quality and also benefits training in the alignment stage. Furthermore, we explore how FIRE sampling improves performance by promoting diversity and analyze the impact of employing FIRE at different positions within a response.