ReAct (Tool use / function calling): heavily superseded — a standard baseline that newer methods routinely beat. 6 paper(s) critique it, 4 beat it on benchmarks — #1 of 55 most-superseded. Sub-problem: cluster led by ReAct. Newer alternatives in the same sub-problem include GenesisFunc, Think-Augmented Function Calling (TAFC).

Method Drift›Tool use / function calling

Heavily superseded#1 of 55 most-superseded

ReAct

ReAct: Synergizing Reasoning and Acting in Language Models

Tool use / function calling · first seen Oct 6, 2022

heavily superseded — a standard baseline that newer methods routinely beat

6 papers critique it · 4 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites ReAct as a baseline.

“ReAct~Yao2023ReAct, Auto-GPT~Richards2023AutoGPT, and GAIA~mialon2023gaia explored the interaction between reasoning and acting, though often in synthetic or text-only environments”
— MCPAgentBench: A Real-world Task Benchmark for Evaluating LLM Agent MCP Tool Use
“ReAct combines reasoning with API calls for multi-step tasks, but its performance is constrained by pretraining and degrades with increased tool complexity.”
— GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling
“These results imply that curated data and retrieval augmentation, not sheer parameter count, are the present keys to dependable LLM tool use.”
— Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky
“they operate at the agent level rather than providing fine-grained reasoning for individual function parameters”
— Think-Augmented Function Calling: Improving LLM Parameter Accuracy Through Embedded Reasoning
“The dominant approach for orchestrating tool usage relies on reactive, step-by-step reasoning frameworks like ReAct, often augmented by self-reflection techniques. However, this paradigm suffers from inherent local optimization traps due to its incremental decision-making process. While potentially effective for simple queries, its reactive nature often falters on complex tasks.”
— Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning
“These studies substantially advance reasoning control and tool-use alignment, but they generally treat reasoning depth and execution structure as separate concerns rather than as jointly case-conditioned aspects of the same problem.”
— Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use

Beaten on benchmarks

Head-to-head results where a newer method reports beating ReAct. Values are copied from the source paper's tables — verify against the cited paper.

ReAct-TS-Flow (TS-Guard) beats ReAct · ASR [GPT-4o as Agent Backbone, AgentDojo]
1.16 vs 56.16
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · Utility [GPT-4o as Agent Backbone, AgentDojo]
42.78 vs 26.87
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · ASR [GPT-4o as Agent Backbone, ASB-DPI]
6.76 vs 82.25
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · Utility [GPT-4o as Agent Backbone, ASB-DPI]
18.87 vs 12.50
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · ASR [GPT-4o as Agent Backbone, ASB-IPI]
6.19 vs 80.00
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · Utility [GPT-4o as Agent Backbone, ASB-IPI]
49.01 vs 48.00
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · Refusal [GPT-4o as Agent Backbone, AgentHarm]
94.32 vs 62.50
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · Score [GPT-4o as Agent Backbone, AgentHarm]
6.03 vs 23.53
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · ASR [Qwen2.5-14B-Instruct as Agent Backbone, AgentDojo]
1.79 vs 17.59
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · Utility [Qwen2.5-14B-Instruct as Agent Backbone, AgentDojo]
42.72 vs 42.57
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · ASR [Qwen2.5-14B-Instruct as Agent Backbone, ASB-DPI]
7.25 vs 95.25
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · Utility [Qwen2.5-14B-Instruct as Agent Backbone, ASB-DPI]
30.00 vs 18.75
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.