ReAct (LLM reasoning / chain-of-thought): superseded — cited as a baseline and beaten by newer methods. 5 paper(s) critique it, 3 beat it on benchmarks — #7 of 772 most-superseded. Sub-problem: cluster led by ReAct. Newer alternatives in the same sub-problem include OLIVIA, Planner-centric Plan-Execute paradigm, SR^2, DS-STAR.

Method Drift›LLM reasoning / chain-of-thought

Superseded baseline#7 of 772 most-superseded

ReAct

ReAct: Synergizing Reasoning and Acting in Language Models

LLM reasoning / chain-of-thought · first seen Oct 6, 2022

superseded — cited as a baseline and beaten by newer methods

5 papers critique it · 3 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites ReAct as a baseline.

“In our initial experiments with ReAct for planning, we found that the system is overly dependent on the syntactic similarity of the example prompt and the query and is extremely brittle to minor perturbations to the input prompt.”
— On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models
“agent behavior is typically static at inference time: the model follows a fixed prompt-guided policy, while feedback gathered during deployment is not used to systematically improve future decisions”
— OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents
“However, the REACT approach does not tackle efficiency in the second stage, relation modeling.”
— REACT++: Efficient Cross-Attention for Real-Time Scene Graph Generation
“we demonstrate that CoT and its reasoning variants (e.g., ToT, ReAct) consistently underperform direct answering by a significant margin”
— The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning
“The dominant approach for orchestrating tool usage relies on reactive, step-by-step reasoning frameworks like ReAct, often augmented by self-reflection techniques. However, this paradigm suffers from inherent local optimization traps due to its incremental decision-making process. While potentially effective for simple queries, its reactive nature often falters on complex tasks.”
— Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning

Beaten on benchmarks

Head-to-head results where a newer method reports beating ReAct. Values are copied from the source paper's tables — verify against the cited paper.

DS-STAR (Ours) beats ReAct · Easy [Gemini-2.5-Pro]
87.50 vs 69.44
DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats ReAct · Hard [Gemini-2.5-Pro]
45.24 vs 10.05
DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats ReAct · Data Wrangling [Gemini-2.5-Pro]
30.4 vs 14.7
DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats ReAct · ML [Gemini-2.5-Pro]
57.3 vs 31.2
DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats ReAct · EDA [Gemini-2.5-Pro]
34.8 vs 22.2
DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats ReAct · Medium [Gemini-2.5-Pro]
35.2 vs 17.9
DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats ReAct · Total [Gemini-2.5-Pro]
38.5 vs 22.5
DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats ReAct · Total [Original setting]
44.69 vs 30.31
DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats ReAct · Total [Oracle setting]
52.55 vs 33.82
DS-STAR: Data Science Agent via Iterative Planning and Verification
OLIVIA (Ours) beats ReAct · F1 [Qwen model]
0.585 vs 0.520
OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents
OLIVIA (Ours) beats ReAct · F1 [Mistral model]
0.420 vs 0.386
OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents
Direct beats ReAct · Acc (%) [All models]
17.11 vs 8.69
The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.