Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
This addresses efficiency issues in LLM reasoning for users needing faster, less resource-intensive processing, though it is incremental as it builds on existing prompting methods.
The paper tackles the problem of excessive verbosity and computational overhead in Chain-of-Thought prompting for LLMs by proposing Sketch-of-Thought, a framework that reduces token usage by up to 84% across 18 reasoning datasets while maintaining or improving accuracy.
Recent advances in large language models (LLMs) have enabled strong reasoning capabilities through Chain-of-Thought (CoT) prompting, which elicits step-by-step problem solving, but often at the cost of excessive verbosity in intermediate outputs, leading to increased computational overhead. We propose Sketch-of-Thought (SoT), a prompting framework that integrates cognitively inspired reasoning paradigms with linguistic constraints to reduce token usage while preserving reasoning accuracy. SoT is designed as a flexible, modular approach and is instantiated with three paradigms--Conceptual Chaining, Chunked Symbolism, and Expert Lexicons--each tailored to distinct reasoning tasks and selected dynamically at test-time by a lightweight routing model. Across 18 reasoning datasets spanning multiple domains, languages, and modalities, SoT achieves token reductions of up to 84% with minimal accuracy loss. In tasks such as mathematical and multi-hop reasoning, it even improves accuracy while shortening outputs.