Method DriftTool use / function calling

Heavily superseded#3 of 55 most-superseded

ToolACE

ToolACE: Winning the Points of LLM Function Calling

Tool use / function calling · first seen Sep 2, 2024

heavily superseded — a standard baseline that newer methods routinely beat

2 papers critique it · 4 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites ToolACE as a baseline.

  • Prior works often rely on annotated or synthetic APIs, which lack reliability and struggle to scale across larger tool sets. These approaches also face limitations in diversity, quality, and coverage.
    GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling
  • While APIGen~liu2024apigen, ToolACE~liu2024toolace, and DeCRIM~ferraz2024llm produce verified function-call traces from fully specified queries, and works such as Clarify-When-Necessary~zhang2023clarify theorize when to seek clarification without training a model for it, DiaFORGE unifies three mutually reinforcing contributions absent from any single prior work: (i)~disambiguation-centric synthesis that structurally obliges the assistant to navigate near-duplicate API surfaces via injected distractors and a two-phase coercive dialogue protocol; (ii)~reasoning-trace SFT that jointly teaches tool disambiguation and argument solicitation in a single multi-turn curriculum across 3--70~B parameter models; and (iii)~a dynamic agentic evaluation that redeploys fine-tuned models in a live conversational loop with a simulated user, measuring end-to-end goal completion rather than isolated turn accuracy.
    Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky

Beaten on benchmarks

Head-to-head results where a newer method reports beating ToolACE. Values are copied from the source paper's tables — verify against the cited paper.

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.