ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases
This addresses the challenge of tool learning for compact language models, offering a scalable approach to embodied intelligence with minimal human intervention, though it is incremental in improving upon existing methods.
The paper tackles the problem of enabling smaller language models to use real-world tools effectively without tool-specific training, by introducing ToolAlpaca, a framework that automatically generates a diverse corpus of 3938 tool-use instances and fine-tunes compact models, achieving generalized tool-use capabilities comparable to GPT-3.5.
Enabling large language models to utilize real-world tools effectively is crucial for achieving embodied intelligence. Existing approaches to tool learning have either primarily relied on extremely large language models, such as GPT-4, to attain generalized tool-use abilities in a zero-shot manner, or utilized supervised learning to train limited scopes of tools on compact models. However, it remains uncertain whether smaller language models can achieve generalized tool-use abilities without tool-specific training. To address this question, this paper introduces ToolAlpaca, a novel framework designed to automatically generate a diverse tool-use corpus and learn generalized tool-use abilities on compact language models with minimal human intervention. Specifically, ToolAlpaca first automatically creates a highly diversified tool-use corpus by building a multi-agent simulation environment. The corpus contains 3938 tool-use instances from more than 400 real-world tool APIs spanning 50 distinct categories. Subsequently, the constructed corpus is employed to fine-tune compact language models, resulting in two models, namely ToolAlpaca-7B and ToolAlpaca-13B, respectively. Finally, we evaluate the ability of these models to utilize previously unseen tools without specific training. Experimental results demonstrate that ToolAlpaca achieves effective generalized tool-use capabilities comparable to those of extremely large language models like GPT-3.5, demonstrating that learning generalized tool-use ability is feasible for compact language models.