CLSep 3, 2025

Advancing SLM Tool-Use Capability using Reinforcement Learning

Dhruvi Paprunia, Vansh Kharidia, Pankti Doshi

arXiv:2509.04518v21 citationsAIC

Originality Incremental advance

AI Analysis

This work addresses the challenge of enhancing tool-use for SLMs in resource-constrained settings, representing an incremental advancement in AI agent capabilities.

The study tackled the problem of improving tool-use accuracy in Small Language Models (SLMs) by applying Reinforcement Learning with Group Relative Policy Optimization (GRPO), resulting in significant improvements in capabilities such as function calling and JSON output.

In an era where tool-augmented AI agents are becoming increasingly vital, our findings highlight the ability of Group Relative Policy Optimization (GRPO) to empower SLMs, which are traditionally constrained in tool use. The ability to use tools effectively has become a defining feature of Large Language Models (LLMs), allowing them to access external data and internal resources. As AI agents grow more sophisticated, tool-use capabilities have become indispensable. While LLMs have made significant progress in this area, Small Language Models (SLMs) still face challenges in accurately integrating tool use, especially in resource-constrained settings. This study investigates how Reinforcement Learning, specifically Group Relative Policy Optimization (GRPO), can enhance the tool-use accuracy of SLMs. By designing a well-defined reward system that reinforces structured JSON output, correct tool selection, and precise parameter usage, we demonstrate that GRPO enables SLMs to achieve significant improvements in tool-use capabilities (function calling/JSON output). Our approach provides a computationally efficient training method that enhances SLMs practical deployment in real-world AI applications.

View on arXiv PDF

Similar