ToT (LLM reasoning / chain-of-thought): heavily superseded — a standard baseline that newer methods routinely beat. 9 paper(s) critique it, 5 beat it on benchmarks — #3 of 772 most-superseded. Sub-problem: cluster led by Chain-of-Thought. Newer alternatives in the same sub-problem include Marginal Sharpening, Tree-of-Thoughts, Co-ReAct, MA-CoT, Novelty-based Tree-of-Thought Search.

Method Drift›LLM reasoning / chain-of-thought

Heavily superseded#3 of 772 most-superseded

ToT

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

LLM reasoning / chain-of-thought · first seen May 17, 2023

heavily superseded — a standard baseline that newer methods routinely beat

9 papers critique it · 5 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites ToT as a baseline.

“ToT implementations perform a predetermined number of reasoning steps, which can lead to ``overthinking''”
— STATe-of-Thoughts: Structured Action Templates for Tree-of-Thoughts
“This deliberate structure improves performance over standard Chain-of-Thought (CoT) prompting but incurs high computational cost.”
— Dual-Track CoT: Budget-Aware Stepwise Guidance for Small LMs
“The critical bottleneck in this workflow lies with the state evaluator. In the original ToT work, the evaluator relies on expensive LLM self-reflection, which involves prompting the model to critique its own outputs. This introduces substantial computational overhead, making the process impractical for many applications.”
— Domain-Specialized Tree of Thought through Plug-and-Play Predictors
“we demonstrate that CoT and its reasoning variants (e.g., ToT, ReAct) consistently underperform direct answering by a significant margin”
— The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning
“reasoning trees exhibit intractable branching factors and depth, while PRMs may fail to accurately evaluate intermediate steps”
— Limits of PRM-Guided Tree Search for Mathematical Reasoning with LLMs
“In all of these methods the guidance signal---critique, scoring model, abstraction prompt---is produced by an untrained, prompted LLM.”
— Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents
“Attempts to remedy this through more complicated methods such as Tree of Thoughts (ToT) suffer from drawbacks such as high computation cost. In ToT specifically, the cost stems from branching "thoughts" that lead to exponential runtime and token usage during the graph search.”
— Novelty-based Tree-of-Thought Search for LLM Reasoning and Planning
“ToT generates multiple leaf nodes as potential answers, but without a verifier, it is unclear which leaf node should be selected as the final solution.”
— BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving
“Tree-of-Thought (ToT) performs hierarchical branching but may suffer from exponential growth”
— Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution

Beaten on benchmarks

Head-to-head results where a newer method reports beating ToT. Values are copied from the source paper's tables — verify against the cited paper.

\shortname{} beats ToT · Avg. [GPT-OSS-120B]
83.0 vs 79.9
Adversarial Training for Process Reward Models
\shortname{} beats ToT · Avg. [GPT-OSS-20B]
85.0 vs 80.7
Adversarial Training for Process Reward Models
\shortname{} beats ToT · Avg. [Gemma-3-27B]
85.2 vs 80.2
Adversarial Training for Process Reward Models
\shortname{} beats ToT · Avg. [Gemma-3-12B]
58.6 vs 55.3
Adversarial Training for Process Reward Models
\shortname{} beats ToT · Overall [GPT-OSS-120B]
64.0 vs 61.4
Adversarial Training for Process Reward Models
\shortname{} beats ToT · Overall [GPT-OSS-20B]
63.0 vs 61.7
Adversarial Training for Process Reward Models
\shortname{} beats ToT · Overall [Gemma-3-27B-IT]
63.5 vs 61.1
Adversarial Training for Process Reward Models
ReST-MCTS beats ToT · Ave. [GLM4]
16.77 vs 15.82
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
ReST-MCTS beats ToT · Ave. [GPT-3.5-turbo]
10.06 vs 8.44
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
ReST-MCTS beats ToT · Ave. [LLaMA2-13B-Chat]
2.90 vs 2.37
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
ReST-MCTS beats ToT · Ave. [GPT-3.5-Turbo]
62.31 vs 61.06
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
Direct beats ToT · Acc (%) [All models]
17.11 vs 8.85
The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.