Reasoning about Reasoning: BAPO Bounds on Chain-of-Thought Token Complexity in LLMs

Kiran Tomlinson, Tobias Schnabel, Adith Swaminathan, Jennifer Neville

arXiv:2602.02909v12.4

Originality Incremental advance

AI Analysis

This addresses fundamental bottlenecks in inference-time compute for LLM users, offering a principled analysis tool, but it is incremental as it extends an existing theoretical model.

The paper tackled the problem of how many reasoning tokens are required for chain-of-thought (CoT) reasoning in LLMs as input size grows, proving lower bounds of Ω(n) tokens for three tasks and showing experimental results with approximately linear scaling and failures under constraints.

Inference-time scaling via chain-of-thought (CoT) reasoning is a major driver of state-of-the-art LLM performance, but it comes with substantial latency and compute costs. We address a fundamental theoretical question: how many reasoning tokens are required to solve a problem as input size grows? By extending the bounded attention prefix oracle (BAPO) model--an abstraction of LLMs that quantifies the information flow required to solve a task--we prove lower bounds on the CoT tokens required for three canonical BAPO-hard tasks: binary majority, triplet matching, and graph reachability. We show that each requires $Ω(n)$ reasoning tokens when the input size is $n$. We complement these results with matching or near-matching upper bounds via explicit constructions. Finally, our experiments with frontier reasoning models show approximately linear reasoning token scaling on these tasks and failures when constrained to smaller reasoning budgets, consistent with our theoretical lower bounds. Together, our results identify fundamental bottlenecks in inference-time compute through CoT and offer a principled tool for analyzing optimal reasoning length.

View on arXiv PDF

Similar