The Path of Least Resistance: Guiding LLM Reasoning Trajectories with Prefix Consensus
This addresses the problem of high computational costs in LLM reasoning for researchers and practitioners, offering an incremental improvement by making existing methods more efficient without fine-tuning.
The paper tackles the computational expense of inference strategies like Self-Consistency in large language models by introducing PoLR, a method that uses prefix consensus to guide reasoning trajectories, reducing token usage by up to 60% and latency by up to 50% while matching or exceeding accuracy on benchmarks like GSM8K and MATH500.
Large language models achieve strong reasoning performance, but inference strategies such as Self-Consistency (SC) are computationally expensive, as they fully expand all reasoning traces. We introduce PoLR (Path of Least Resistance), the first inference-time method to leverage prefix consistency for compute-efficient reasoning. PoLR clusters short prefixes of reasoning traces, identifies the dominant cluster, and expands all paths in that cluster, preserving the accuracy benefits of SC while substantially reducing token usage and latency. Our theoretical analysis, framed via mutual information and entropy, explains why early reasoning steps encode strong signals predictive of final correctness. Empirically, PoLR consistently matches or exceeds SC across GSM8K, MATH500, AIME24/25, and GPQA-DIAMOND, reducing token usage by up to 60% and wall-clock latency by up to 50%. Moreover, PoLR is fully complementary to adaptive inference methods (e.g., Adaptive Consistency, Early-Stopping SC) and can serve as a drop-in pre-filter, making SC substantially more efficient and scalable without requiring model fine-tuning.