Method Drift›Tool use / function calling
Tracked
ToolAlign
Tool use / function calling
present, but with little supersession signal in the knowledge base
0 papers critique it · 1 beat it on benchmarks
Beaten on benchmarks
Head-to-head results where a newer method reports beating ToolAlign. Values are copied from the source paper's tables — verify against the cited paper.
- Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use
Qwen2.5-7B-Instruct-CAST beats ToolAlign · BFCLv2 Overall [Qwen2.5-7B-Instruct]
88.43 vs 71.16
- Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use
Qwen2.5-7B-Instruct-CAST beats ToolAlign · ToolBench Pass [Qwen2.5-7B-Instruct]
80.67 vs 46.78
- Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use
Qwen2.5-7B-Instruct-CAST beats ToolAlign · ToolBench Win [Qwen2.5-7B-Instruct]
79.43 vs 22.36
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.