Natural Language Tools: A Natural Language Approach to Tool Calling In Large Language Agents
This addresses tool-calling performance degradation in LLMs, particularly benefiting open-weight models, though it appears incremental as it modifies an existing framework rather than introducing a new paradigm.
The paper tackles the problem of tool calling in large language models by replacing programmatic JSON outputs with natural language, eliminating task interference and format constraints. The result is an 18.4 percentage point improvement in tool calling accuracy and a 70% reduction in output variance across 10 models and 6,400 trials.
We present Natural Language Tools (NLT), a framework that replaces programmatic JSON tool calling in large language models (LLMs) with natural language outputs. By decoupling tool selection from response generation, NLT eliminates task interference and format constraints that degrade tool call performance. When evaluated across 10 models and 6,400 trials spanning customer service and mental health domains, NLT improves tool calling accuracy by 18.4 percentage points while reducing output variance by 70%. Open-weight models see the largest gains, surpassing flagship closed-weight alternatives, with implications for model training in both reinforcement learning and supervised fine-tuning stages. These improvements persist under prompt perturbations and extend tool-calling capabilities to models lacking native support.