LGCLMar 18, 2025

PENCIL: Long Thoughts with Short Memory

arXiv:2503.14337v219 citationsh-index: 27ICML
Originality Highly original
AI Analysis

This addresses a bottleneck in scaling reasoning for LLMs, offering a more efficient approach to complex problem-solving, though it appears incremental as it builds on existing CoT methods.

The authors tackled the problem of inefficient memory usage in long Chains-of-Thought (CoT) reasoning by introducing PENCIL, a method that recursively cleans up intermediate thoughts during generation, enabling deeper reasoning with shorter context and less compute. For example, PENCIL with a 25M-parameter transformer and 2048 context length solved Einstein's puzzle, a task challenging for larger models like GPT-4.

While state-of-the-art LLMs have demonstrated great promise of using long Chains-of-Thought (CoT) to boost reasoning, scaling it up to more challenging problems at test-time is fundamentally limited by suboptimal memory usage -- intermediate computations accumulate indefinitely in context even when no longer needed for future thoughts. We introduce PENCIL, which incorporates a novel reduction mechanism into the autoregressive generation process that recursively cleans up intermediate thoughts based on patterns learned from training. By iteratively generating and erasing thoughts, PENCIL can think deeper to solve harder problems using shorter context and less compute. Empirically, we observe PENCIL is significantly more effective and efficient than CoT. For example, we demonstrate PENCIL with a small 25M-parameter transformer and 2048 context length solves Einstein's puzzle -- a task that challenges much larger models like GPT-4. Theoretically, we prove PENCIL can perform universal efficient computation by simulating any Turing machines with optimal time and space complexity, and thus can solve arbitrary computable tasks that are otherwise intractable for vanilla CoT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes