Method Drift›Tool use / function calling
Toolformer
Toolformer: Language Models Can Teach Themselves to Use ToolsTool use / function calling · first seen Feb 9, 2023
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 1 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Toolformer as a baseline.
“Toolformer~Toolformer focuses on when to invoke tools rather than reasoning about fine-grained tool costs or long-term expenditure.”
— CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents“These studies substantially advance reasoning control and tool-use alignment, but they generally treat reasoning depth and execution structure as separate concerns rather than as jointly case-conditioned aspects of the same problem.”
— Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use
Beaten on benchmarks
Head-to-head results where a newer method reports beating Toolformer. Values are copied from the source paper's tables — verify against the cited paper.
- Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use
Qwen2.5-7B-Instruct-CAST beats Toolformer · BFCLv2 Overall [Qwen2.5-7B-Instruct]
88.43 vs 67.07
- Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use
Qwen2.5-7B-Instruct-CAST beats Toolformer · ToolBench Pass [Qwen2.5-7B-Instruct]
80.67 vs 48.92
- Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use
Qwen2.5-7B-Instruct-CAST beats Toolformer · ToolBench Win [Qwen2.5-7B-Instruct]
79.43 vs 22.11
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.