Do LLMs Encode Functional Importance of Reasoning Tokens?

arXiv:2601.0306644.63 citationsh-index: 3

AI Analysis

For researchers and practitioners seeking to compress reasoning chains in LLMs, this work provides a diagnostic method and demonstrates that models encode token-level functional importance, enabling more efficient distillation.

This paper investigates whether LLMs internally encode functional importance of reasoning tokens. The authors propose greedy pruning, which removes tokens with minimal impact on likelihood, and show that students trained on pruned chains outperform a frontier-model-supervised compression baseline at matched reasoning lengths.

Large language models solve complex tasks by generating long reasoning chains, achieving higher accuracy at the cost of increased computational cost and reduced ability to isolate functionally relevant reasoning. Prior work on compact reasoning shortens such chains through probabilistic sampling, heuristics, or supervision from frontier models, but offers limited insight into whether models internally encode token-level functional importance for answer generation. We address this gap diagnostically and propose greedy pruning, a likelihood-preserving deletion procedure that iteratively removes reasoning tokens whose removal minimally degrades model likelihood under a specified objective, yielding length-controlled reasoning chains. We evaluate pruned reasoning in a distillation framework and show that students trained on pruned chains outperform a frontier-model-supervised compression baseline at matched reasoning lengths. Finally, our analysis reveals systematic pruning patterns and shows that attention scores can predict greedy pruning ranks, further suggesting that models encode a nontrivial functional importance structure over reasoning tokens.

View on arXiv PDF

Similar