Vansh Kharidia

h-index4
2papers

2 Papers

CLSep 3, 2025
Advancing SLM Tool-Use Capability using Reinforcement Learning

Dhruvi Paprunia, Vansh Kharidia, Pankti Doshi

In an era where tool-augmented AI agents are becoming increasingly vital, our findings highlight the ability of Group Relative Policy Optimization (GRPO) to empower SLMs, which are traditionally constrained in tool use. The ability to use tools effectively has become a defining feature of Large Language Models (LLMs), allowing them to access external data and internal resources. As AI agents grow more sophisticated, tool-use capabilities have become indispensable. While LLMs have made significant progress in this area, Small Language Models (SLMs) still face challenges in accurately integrating tool use, especially in resource-constrained settings. This study investigates how Reinforcement Learning, specifically Group Relative Policy Optimization (GRPO), can enhance the tool-use accuracy of SLMs. By designing a well-defined reward system that reinforces structured JSON output, correct tool selection, and precise parameter usage, we demonstrate that GRPO enables SLMs to achieve significant improvements in tool-use capabilities (function calling/JSON output). Our approach provides a computationally efficient training method that enhances SLMs practical deployment in real-world AI applications.

AIOct 21, 2024
LightFusionRec: Lightweight Transformers-Based Cross-Domain Recommendation Model

Vansh Kharidia, Dhruvi Paprunia, Prashasti Kanikar

This paper presents LightFusionRec, a novel lightweight cross-domain recommendation system that integrates DistilBERT for textual feature extraction and FastText for genre embedding. Important issues in recommendation systems, such as data sparsity, computational efficiency, and cold start issues, are addressed in methodology. LightFusionRec uses a small amount of information to produce precise and contextually relevant recommendations for many media formats by fusing genre vector embedding with natural language processing algorithms. Tests conducted on extensive movie and book datasets show notable enhancements in suggestion quality when compared to conventional methods. Because of its lightweight design, the model can be used for a variety of purposes and allows for ondevice inference. LightFusionRec is a noteworthy development in cross-domain recommendation systems, providing accurate and scalable recommendations to improve user experience on digital content platforms.