Provable Benefits of In-Tool Learning for Large Language Models
This work addresses scalability issues in AI for developers and researchers by providing theoretical and empirical evidence for tool-augmented workflows, though it is incremental as it builds on existing tool-use concepts.
The paper tackles the problem of limited factual recall in language models by demonstrating that in-tool learning (using external retrieval) allows for unbounded recall, while in-weight learning (memorization) is constrained by parameter count, with tool-using models outperforming memorizing ones in experiments.
Tool-augmented language models, equipped with retrieval, memory, or external APIs, are reshaping AI, yet their theoretical advantages remain underexplored. In this paper, we address this question by demonstrating the benefits of in-tool learning (external retrieval) over in-weight learning (memorization) for factual recall. We show that the number of facts a model can memorize solely in its weights is fundamentally limited by its parameter count. In contrast, we prove that tool-use enables unbounded factual recall via a simple and efficient circuit construction. These results are validated in controlled experiments, where tool-using models consistently outperform memorizing ones. We further show that for pretrained large language models, teaching tool-use and general rules is more effective than finetuning facts into memory. Our work provides both a theoretical and empirical foundation, establishing why tool-augmented workflows are not just practical, but provably more scalable.