Method Drift

Living systematic review

Tool use / function calling

Teaching LLMs to call external tools and APIs — function-calling, tool selection/retrieval, and tool-augmented agents.

52 papers · 79 critique receipts · 186 benchmark results · updated Jun 18, 2026

Most-superseded baselines

Ranked by how many distinct papers critique or beat each method. These are the standard baselines newer work routinely measures against.

  1. 1
    ReAct· ReAct
    ReAct: Synergizing Reasoning and Acting in Language Models

    6 papers critique it · 4 beat it on benchmarks

  2. 2
    ToolLLM· ToolLLM
    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

    4 papers critique it · 3 beat it on benchmarks

  3. 3
    ToolACE· ReAct
    ToolACE: Winning the Points of LLM Function Calling

    2 papers critique it · 4 beat it on benchmarks

  4. 4
    API-Bank· ReAct
    API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

    5 papers critique it · 0 beat it on benchmarks

  5. 5
    Gorilla· ToolLLM
    Gorilla: Large Language Model Connected with Massive APIs

    3 papers critique it · 1 beat it on benchmarks

  6. 7
    StepTool· StepTool

    2 papers critique it · 1 beat it on benchmarks

  7. 8
    Toolformer· Toolformer
    Toolformer: Language Models Can Teach Themselves to Use Tools

    2 papers critique it · 1 beat it on benchmarks

  8. 10
    xLAM· ReAct
    xLAM: A Family of Large Action Models to Empower AI Agent Systems

    0 papers critique it · 2 beat it on benchmarks

  9. 11
    ART· ART
    ART: Automatic multi-step reasoning and tool-use for large language models

    1 papers critique it · 1 beat it on benchmarks

  10. 12
    ExpeL· ART
    ExpeL: LLM Agents Are Experiential Learners

    1 papers critique it · 1 beat it on benchmarks

Sub-problems

Methods that compete on the same benchmarks cluster into distinct sub-problems.

ReAct · 24 methods

ReAct · ToolACE · API-Bank · APIGen · xLAM · StableToolBench

ToolLLM · 10 methods

ToolLLM · Gorilla · ToolAlpaca · Less-is-More · TinyAgent · ToolPlanner

StepTool · 9 methods

StepTool · ToolRL · CodeAct · Search-R1 · Search-R1 PPO · R2IF

AgentAuditor · 6 methods

AgentAuditor · AGrail · GuardAgent · LlamaFirewall · ShieldAgent · ToolSafe

GRPO · 5 methods

GRPO · SAGE · Reflexion · Reinforced Agent · RC-GRPO

Toolformer · 4 methods

Toolformer · CAST · CostBench · ToolAlign

ReTool · 4 methods

ReTool · SWiRL · CoCoDA · SPaRK

Mem0 · 4 methods

Mem0 · NLSI · PEToolLLM · PRefine

The frontier

Recent methods not yet superseded in the knowledge base.