Adaptive Blockwise Search: Inference-Time Alignment for Large Language Models
This work addresses the challenge of efficient inference-time alignment for LLMs, offering a flexible alternative to fine-tuning with significant performance gains, though it is incremental as it builds on existing search methods.
The paper tackled the problem of suboptimal alignment in large language models by introducing AdaSearch, an adaptive blockwise search strategy that focuses computational effort on critical initial tokens, resulting in win-rate improvements of over 10% for tasks like harmlessness generation, controlled sentiment generation, and mathematical reasoning compared to Best-of-N baselines.
LLM alignment remains a critical challenge. Inference-time methods provide a flexible alternative to fine-tuning, but their uniform computational effort often yields suboptimal alignment. We hypothesize that for many alignment tasks, the initial tokens of a response are disproportionately more critical. To leverage this principle, we introduce AdaSearch, a novel blockwise search strategy. It adaptively allocates a fixed computational budget using a sampling schedule, focusing search effort on these critical tokens. We apply AdaSearch to sequential decoding and introduce its tree-search counterpart, AdaBeam. Our comprehensive evaluation across eight LLMs demonstrates that AdaSearch outperforms strong Best-of-N and fine-tuning baselines. Specifically, win-rates improve by over 10% for harmlessness generation, controlled sentiment generation, and for mathematical reasoning tasks relative to Best-of-N.