CLNov 6, 2025

Logit-Entropy Adaptive Stopping Heuristic for Efficient Chain-of-Thought Reasoning

arXiv:2511.04654v11 citationsh-index: 1

Originality Incremental advance

AI Analysis

This addresses the problem of computational waste in reasoning for large language model users, offering an incremental improvement in efficiency.

The paper tackled the computational inefficiency of generating full rationales in Chain-of-Thought prompting by introducing LEASH, a training-free decoding algorithm that adaptively halts rationale generation, reducing average token generation by 30-35% and latency by 27% with a 10 percentage point accuracy drop.

Chain-of-Thought (CoT) prompting is a key technique for enabling complex reasoning in large language models. However, generating full, fixed-length rationales is computationally wasteful, inflating both token usage and latency. We introduce LEASH: Logit-Entropy Adaptive Stopping Heuristic, a training-free decoding algorithm that adaptively halts rationale generation. LEASH monitors two intrinsic signals: the slope of token-level entropy and the improvement in the top-logit margin. It terminates the generation once both signals plateau, indicating the model has reached a stable reasoning state. Across four instruction-tuned models on the GSM8K and AQuA-RAT benchmarks, LEASH reduces average token generation by 30--35% and latency by 27%, while incurring a 10 p.p. accuracy drop relative to CoT. LEASH is model-agnostic and requires no additional training or supervision, offering a simple and efficient alternative to CoT decoding.

View on arXiv PDF

Similar