LGAIFeb 27

Learning from Partial Chain-of-Thought via Truncated-Reasoning Self-Distillation

arXiv:2603.13274h-index: 5
Originality Incremental advance
AI Analysis

This addresses the issue of excessive computational overhead in AI reasoning tasks, offering a lightweight solution for more efficient inference, though it is incremental as it builds on existing self-distillation techniques.

The paper tackles the problem of high computational cost in reasoning-oriented language models by introducing Truncated-Reasoning Self-Distillation (TRSD), a post-training method that enables models to produce correct predictions from partial reasoning traces, reducing inference-time costs and improving robustness across multiple benchmarks.

Reasoning-oriented language models achieve strong performance by generating long chain-of-thought traces at inference time. However, this capability comes with substantial and often excessive computational cost, which can materialize in redundant or inefficient reasoning. We study this setting and introduce Truncated-Reasoning Self-Distillation (TRSD), a lightweight post-training procedure that encourages models to produce correct predictions from partial reasoning traces. In TRSD, a frozen teacher model first generates a full reasoning trace and evaluates the corresponding answer distribution conditioned on the prompt and the complete reasoning to construct a synthetic training target. A student model with the same architecture is then trained to match the teacher's answer distribution while being conditioned only on a truncated prefix of its reasoning trace. Across multiple reasoning benchmarks and token budgets, we demonstrate that TRSD improves robustness to truncated inference, with far reduced accuracy tradeoffs when applied to a diverse set of reasoning models. Moreover, although never explicitly regularized for shorter generation during training, we also find that TRSD-trained models inherently output shorter reasoning traces without truncation, significantly reducing inference-time costs even without artificial interventions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes