Method Drift›Tool use / function calling
ReAct
ReAct: Synergizing Reasoning and Acting in Language ModelsTool use / function calling · first seen Oct 6, 2022
heavily superseded — a standard baseline that newer methods routinely beat
6 papers critique it · 4 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites ReAct as a baseline.
“ReAct~Yao2023ReAct, Auto-GPT~Richards2023AutoGPT, and GAIA~mialon2023gaia explored the interaction between reasoning and acting, though often in synthetic or text-only environments”
— MCPAgentBench: A Real-world Task Benchmark for Evaluating LLM Agent MCP Tool Use“ReAct combines reasoning with API calls for multi-step tasks, but its performance is constrained by pretraining and degrades with increased tool complexity.”
— GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling“These results imply that curated data and retrieval augmentation, not sheer parameter count, are the present keys to dependable LLM tool use.”
— Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky“they operate at the agent level rather than providing fine-grained reasoning for individual function parameters”
— Think-Augmented Function Calling: Improving LLM Parameter Accuracy Through Embedded Reasoning“The dominant approach for orchestrating tool usage relies on reactive, step-by-step reasoning frameworks like ReAct, often augmented by self-reflection techniques. However, this paradigm suffers from inherent local optimization traps due to its incremental decision-making process. While potentially effective for simple queries, its reactive nature often falters on complex tasks.”
— Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning“These studies substantially advance reasoning control and tool-use alignment, but they generally treat reasoning depth and execution structure as separate concerns rather than as jointly case-conditioned aspects of the same problem.”
— Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use
Beaten on benchmarks
Head-to-head results where a newer method reports beating ReAct. Values are copied from the source paper's tables — verify against the cited paper.
- ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · ASR [GPT-4o as Agent Backbone, AgentDojo]
1.16 vs 56.16
- ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · Utility [GPT-4o as Agent Backbone, AgentDojo]
42.78 vs 26.87
- ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · ASR [GPT-4o as Agent Backbone, ASB-DPI]
6.76 vs 82.25
- ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · Utility [GPT-4o as Agent Backbone, ASB-DPI]
18.87 vs 12.50
- ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · ASR [GPT-4o as Agent Backbone, ASB-IPI]
6.19 vs 80.00
- ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · Utility [GPT-4o as Agent Backbone, ASB-IPI]
49.01 vs 48.00
- ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · Refusal [GPT-4o as Agent Backbone, AgentHarm]
94.32 vs 62.50
- ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · Score [GPT-4o as Agent Backbone, AgentHarm]
6.03 vs 23.53
- ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · ASR [Qwen2.5-14B-Instruct as Agent Backbone, AgentDojo]
1.79 vs 17.59
- ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · Utility [Qwen2.5-14B-Instruct as Agent Backbone, AgentDojo]
42.72 vs 42.57
- ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · ASR [Qwen2.5-14B-Instruct as Agent Backbone, ASB-DPI]
7.25 vs 95.25
- ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ReAct-TS-Flow (TS-Guard) beats ReAct · Utility [Qwen2.5-14B-Instruct as Agent Backbone, ASB-DPI]
30.00 vs 18.75
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.