Method Drift›LLM reasoning / chain-of-thought
ReAct
ReAct: Synergizing Reasoning and Acting in Language ModelsLLM reasoning / chain-of-thought · first seen Oct 6, 2022
superseded — cited as a baseline and beaten by newer methods
5 papers critique it · 3 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites ReAct as a baseline.
“In our initial experiments with ReAct for planning, we found that the system is overly dependent on the syntactic similarity of the example prompt and the query and is extremely brittle to minor perturbations to the input prompt.”
— On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models“agent behavior is typically static at inference time: the model follows a fixed prompt-guided policy, while feedback gathered during deployment is not used to systematically improve future decisions”
— OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents“However, the REACT approach does not tackle efficiency in the second stage, relation modeling.”
— REACT++: Efficient Cross-Attention for Real-Time Scene Graph Generation“we demonstrate that CoT and its reasoning variants (e.g., ToT, ReAct) consistently underperform direct answering by a significant margin”
— The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning“The dominant approach for orchestrating tool usage relies on reactive, step-by-step reasoning frameworks like ReAct, often augmented by self-reflection techniques. However, this paradigm suffers from inherent local optimization traps due to its incremental decision-making process. While potentially effective for simple queries, its reactive nature often falters on complex tasks.”
— Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning
Beaten on benchmarks
Head-to-head results where a newer method reports beating ReAct. Values are copied from the source paper's tables — verify against the cited paper.
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats ReAct · Easy [Gemini-2.5-Pro]
87.50 vs 69.44
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats ReAct · Hard [Gemini-2.5-Pro]
45.24 vs 10.05
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats ReAct · Data Wrangling [Gemini-2.5-Pro]
30.4 vs 14.7
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats ReAct · ML [Gemini-2.5-Pro]
57.3 vs 31.2
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats ReAct · EDA [Gemini-2.5-Pro]
34.8 vs 22.2
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats ReAct · Medium [Gemini-2.5-Pro]
35.2 vs 17.9
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats ReAct · Total [Gemini-2.5-Pro]
38.5 vs 22.5
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats ReAct · Total [Original setting]
44.69 vs 30.31
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats ReAct · Total [Oracle setting]
52.55 vs 33.82
- OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents
OLIVIA (Ours) beats ReAct · F1 [Qwen model]
0.585 vs 0.520
- OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents
OLIVIA (Ours) beats ReAct · F1 [Mistral model]
0.420 vs 0.386
- The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning
Direct beats ReAct · Acc (%) [All models]
17.11 vs 8.69
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.