Method Drift›Tool use / function calling
Superseded baseline#16 of 55 most-superseded
ToolAlpaca
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated CasesTool use / function calling · first seen Jun 8, 2023
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 1 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites ToolAlpaca as a baseline.
“the fine-tuned models from datasets like ToolLLM~qin2023toolllm, ToolAlpaca~tang2023toolalpaca, and Gorilla~patil2023gorilla underperform in one (or more) of three key dimensions: (a) Generalizability: While the datasets are generated using diverse sets of APIs (e.g., ToolLLama uses RapidAPIs~{https://rapidapi.com/hub}, ToolAlpaca uses public APIs{https://github.com/public-apis/public-apis}, and Gorilla uses TensorFlow Hub, PyTorch Hub, and Hugging Face Hub), work from~basu2024apiblend has shown that models trained on these datasets have difficulty generalizing to out-of-domain datasets.”
— Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Beaten on benchmarks
Head-to-head results where a newer method reports beating ToolAlpaca. Values are copied from the source paper's tables — verify against the cited paper.
- ASA: Activation Steering for Tool-Calling Domain Adaptation
ASA beats ToolAlpaca · Overall First Call Accuracy [NESTFUL evaluation]
41.94 vs 28.23
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.