Method DriftTool use / function calling

Superseded baseline#16 of 55 most-superseded

ToolAlpaca

ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases

Tool use / function calling · first seen Jun 8, 2023

superseded — cited as a baseline and beaten by newer methods

1 papers critique it · 1 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites ToolAlpaca as a baseline.

  • the fine-tuned models from datasets like ToolLLM~qin2023toolllm, ToolAlpaca~tang2023toolalpaca, and Gorilla~patil2023gorilla underperform in one (or more) of three key dimensions: (a) Generalizability: While the datasets are generated using diverse sets of APIs (e.g., ToolLLama uses RapidAPIs~{https://rapidapi.com/hub}, ToolAlpaca uses public APIs{https://github.com/public-apis/public-apis}, and Gorilla uses TensorFlow Hub, PyTorch Hub, and Hugging Face Hub), work from~basu2024apiblend has shown that models trained on these datasets have difficulty generalizing to out-of-domain datasets.
    Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks

Beaten on benchmarks

Head-to-head results where a newer method reports beating ToolAlpaca. Values are copied from the source paper's tables — verify against the cited paper.

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.