Is API-Bank superseded?

API-Bank (Tool use / function calling): heavily superseded — a standard baseline that newer methods routinely beat. 5 paper(s) critique it, 0 beat it on benchmarks — #4 of 55 most-superseded. Sub-problem: cluster led by ReAct. Newer alternatives in the same sub-problem include GenesisFunc, Think-Augmented Function Calling (TAFC).

Method Drift›Tool use / function calling

Heavily superseded#4 of 55 most-superseded

API-Bank

API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

Tool use / function calling · first seen Apr 14, 2023

heavily superseded — a standard baseline that newer methods routinely beat

5 papers critique it · 0 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites API-Bank as a baseline.

“tested Plan--Retrieve--Call behavior but lacked a unified protocol abstraction”
— MCPAgentBench: A Real-world Task Benchmark for Evaluating LLM Agent MCP Tool Use
“API-Bank assesses tool-augmented models but limits API candidates to fewer than five per task.”
— CallNavi, A Challenge and Empirical Study on LLM Function Calling and Routing
“Although API-Bank api_bank contains multi-turn interactions, the number of turns in each dialogue is limited (2.84 on average), and the interactions are relatively simplistic.”
— ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models
“Most public benchmarks still overlook other enterprise-grade challenges, notably distinguishing among near-duplicate tools, proactively eliciting mandatory arguments, and detecting or preventing tool-call hallucinations, shortcomings our framework is expressly designed to remedy.”
— Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky
“API-Bank~li-etal-2023-api, ToolTalk farn2023tooltalkevaluatingtoolusageconversational and $$-bench yao2024taubenchbenchmarktoolagentuserinteraction does include a set of tools to modify world states, but does not study the impact of state dependencies.”
— ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.