ESTAR: Early-Stopping Token-Aware Reasoning For Efficient Inference
This addresses efficiency issues for users of large reasoning models by enabling faster inference without accuracy loss, though it is an incremental improvement on existing methods.
The paper tackles the problem of redundant reasoning in large reasoning models by introducing ESTAR, which reduces reasoning length by about 3.7x (from 4,799 to 1,290 tokens) while preserving accuracy (74.9% vs. 74.2%) across four datasets.
Large reasoning models (LRMs) achieve state-of-the-art performance by generating long chains-of-thought, but often waste computation on redundant reasoning after the correct answer has already been reached. We introduce Early-Stopping for Token-Aware Reasoning (ESTAR), which detects and reduces such reasoning redundancy to improve efficiency without sacrificing accuracy. Our method combines (i) a trajectory-based classifier that identifies when reasoning can be safely stopped, (ii) supervised fine-tuning to teach LRMs to propose self-generated <stop> signals, and (iii) <stop>-aware reinforcement learning that truncates rollouts at self-generated stop points with compute-aware rewards. Experiments on four reasoning datasets show that ESTAR reduces reasoning length by about 3.7x (from 4,799 to 1,290) while preserving accuracy (74.9% vs. 74.2%), with strong cross-domain generalization. These results highlight early stopping as a simple yet powerful mechanism for improving reasoning efficiency in LRMs.