CLFeb 4

CoLT: Reasoning with Chain of Latent Tool Calls

arXiv:2602.04246v11 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses the problem of slow reasoning in LLMs for researchers and practitioners, offering an incremental improvement over prior latent methods.

The paper tackles the inefficiency of existing latent reasoning methods for Large Language Models by proposing CoLT, a framework that uses latent tool calls to generate seed tokens which are then unpacked by a smaller external model, achieving higher accuracy and shorter reasoning lengths on four mathematical datasets.

Chain-of-Thought (CoT) is a critical technique in enhancing the reasoning ability of Large Language Models (LLMs), and latent reasoning methods have been proposed to accelerate the inefficient token-level reasoning chain. We notice that existing latent reasoning methods generally require model structure augmentation and exhaustive training, limiting their broader applicability. In this paper, we propose CoLT, a novel framework that implements latent reasoning as ``tool calls''. Instead of reasoning entirely in the latent space, CoLT generates seed tokens that contain information of a reasoning step. When a latent tool call is triggered, a smaller external model will take the hidden states of seed tokens as its input, and unpack the seed tokens back to a full reasoning step. In this way, we can ensure that the main model reasons in the explicit token space, preserving its ability while improving efficiency. Experimental results on four mathematical datasets demonstrate that CoLT achieves higher accuracy and shorter reasoning length than baseline latent models, and is compatible with reinforcement learning algorithms and different decoder structures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes