Method Drift›Tool use / function calling
Heavily superseded#4 of 55 most-superseded
API-Bank
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMsTool use / function calling · first seen Apr 14, 2023
heavily superseded — a standard baseline that newer methods routinely beat
5 papers critique it · 0 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites API-Bank as a baseline.
“tested Plan--Retrieve--Call behavior but lacked a unified protocol abstraction”
— MCPAgentBench: A Real-world Task Benchmark for Evaluating LLM Agent MCP Tool Use“API-Bank assesses tool-augmented models but limits API candidates to fewer than five per task.”
— CallNavi, A Challenge and Empirical Study on LLM Function Calling and Routing“Although API-Bank api_bank contains multi-turn interactions, the number of turns in each dialogue is limited (2.84 on average), and the interactions are relatively simplistic.”
— ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models“Most public benchmarks still overlook other enterprise-grade challenges, notably distinguishing among near-duplicate tools, proactively eliciting mandatory arguments, and detecting or preventing tool-call hallucinations, shortcomings our framework is expressly designed to remedy.”
— Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky“API-Bank~li-etal-2023-api, ToolTalk farn2023tooltalkevaluatingtoolusageconversational and $$-bench yao2024taubenchbenchmarktoolagentuserinteraction does include a set of tools to modify world states, but does not study the impact of state dependencies.”
— ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.