CLJun 13, 2025

DART: Distilling Autoregressive Reasoning to Silent Thought

arXiv:2506.11752v28 citationsh-index: 2EMNLP
Originality Incremental advance
AI Analysis

This addresses efficiency issues in LLM deployment for latency-sensitive applications, representing an incremental improvement in reasoning methods.

The paper tackles the computational overhead of autoregressive Chain-of-Thought reasoning in LLMs by proposing DART, a self-distillation framework that enables non-autoregressive Silent Thought, achieving significant performance gains over existing non-autoregressive baselines without extra inference latency.

Chain-of-Thought (CoT) reasoning has significantly advanced Large Language Models (LLMs) in solving complex tasks. However, its autoregressive paradigm leads to significant computational overhead, hindering its deployment in latency-sensitive applications. To address this, we propose \textbf{DART} (\textbf{D}istilling \textbf{A}utoregressive \textbf{R}easoning to Silent \textbf{T}hought), a self-distillation framework that enables LLMs to replace autoregressive CoT with non-autoregressive Silent Thought (ST). Specifically, DART introduces two training pathways: the CoT pathway for traditional reasoning and the ST pathway for generating answers directly from a few ST tokens. The ST pathway utilizes a lightweight Reasoning Evolvement Module (REM) to align its hidden states with the CoT pathway, enabling the ST tokens to evolve into informative embeddings. During inference, only the ST pathway is activated, leveraging evolving ST tokens to deliver the answer directly. Extensive experimental results demonstrate that DART offers significant performance gains compared with existing non-autoregressive baselines without extra inference latency, serving as a feasible alternative for efficient reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes