LG AISep 16, 2025

Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors

Aniket Didolkar, Nicolas Ballas, Sanjeev Arora, Anirudh Goyal

arXiv:2509.13237v126.023 citationsh-index: 12

Originality Incremental advance

AI Analysis

This work addresses efficiency and performance bottlenecks in LLM reasoning for AI researchers and practitioners, offering incremental improvements by reusing existing reasoning patterns.

The paper tackles the problem of LLMs re-deriving the same intermediate reasoning steps across problems, which inflates token usage and latency, by introducing a mechanism that converts recurring reasoning fragments into reusable behaviors via metacognitive analysis. This approach reduces reasoning tokens by up to 46% while matching or improving accuracy, improves accuracy by up to 10% through self-improvement, and enhances supervised fine-tuning for reasoning models.

Large language models (LLMs) now solve multi-step problems by emitting extended chains of thought. During the process, they often re-derive the same intermediate steps across problems, inflating token usage and latency. This saturation of the context window leaves less capacity for exploration. We study a simple mechanism that converts recurring reasoning fragments into concise, reusable "behaviors" (name + instruction) via the model's own metacognitive analysis of prior traces. These behaviors are stored in a "behavior handbook" which supplies them to the model in-context at inference or distills them into parameters via supervised fine-tuning. This approach achieves improved test-time reasoning across three different settings - 1) Behavior-conditioned inference: Providing the LLM relevant behaviors in-context during reasoning reduces number of reasoning tokens by up to 46% while matching or improving baseline accuracy; 2) Behavior-guided self-improvement: Without any parameter updates, the model improves its own future reasoning by leveraging behaviors from its own past problem solving attempts. This yields up to 10% higher accuracy than a naive critique-and-revise baseline; and 3) Behavior-conditioned SFT: SFT on behavior-conditioned reasoning traces is more effective at converting non-reasoning models into reasoning models as compared to vanilla SFT. Together, these results indicate that turning slow derivations into fast procedural hints enables LLMs to remember how to reason, not just what to conclude.

View on arXiv PDF

Similar