Method Drift›Tool use / function calling
Superseded baseline#9 of 55 most-superseded
APIGen
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling DatasetsTool use / function calling · first seen Jun 26, 2024
superseded — cited as a baseline and beaten by newer methods
3 papers critique it · 0 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites APIGen as a baseline.
“This method employs a rigorous verification process to improve data quality but is limited in scope, focusing predominantly on single-turn function-calling scenarios.”
— Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning“While APIGen~liu2024apigen, ToolACE~liu2024toolace, and DeCRIM~ferraz2024llm produce verified function-call traces from fully specified queries, and works such as Clarify-When-Necessary~zhang2023clarify theorize when to seek clarification without training a model for it, DiaFORGE unifies three mutually reinforcing contributions absent from any single prior work: (i)~disambiguation-centric synthesis that structurally obliges the assistant to navigate near-duplicate API surfaces via injected distractors and a two-phase coercive dialogue protocol; (ii)~reasoning-trace SFT that jointly teaches tool disambiguation and argument solicitation in a single multi-turn curriculum across 3--70~B parameter models; and (iii)~a dynamic agentic evaluation that redeploys fine-tuned models in a live conversational loop with a simulated user, measuring end-to-end goal completion rather than isolated turn accuracy.”
— Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky“Unlike our work, these datasets are not conversational and just focus on mapping utterances to API calls, and they do not use intermediate structures (i.e., graphs) to ensure coverage and reduce hallucinations in generated tests.”
— Automated test generation to evaluate tool-augmented LLMs as conversational AI agents
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.