AICLLGJun 25, 2025

Enhancing Reasoning Capabilities in SLMs with Reward Guided Dataset Distillation

arXiv:2507.00054v1h-index: 2
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing reasoning capabilities in SLMs for more efficient deployment, representing an incremental improvement in dataset distillation techniques.

The study tackled the problem of limited generalizability in knowledge distillation for small language models (SLMs) by proposing AdvDistill, a reward-guided dataset distillation framework that uses rule-based verifiers to weight teacher responses, resulting in significant performance improvements on mathematical and complex reasoning tasks.

The push to compress and impart the proficiency of Large Language Models (LLMs) into more deployable and efficient Small Language Models (SLMs) has benefited from improvements in knowledge distillation (KD) techniques. These techniques allow a smaller student model to learn from a more capable and larger teacher model's responses. However, distillation often revolves around the student model merely copying the teacher's in-distribution responses, limiting its generalisability. This limitation is amplified on reasoning tasks and can be computationally expensive. In this study, we propose AdvDistill, a reward-guided dataset distillation framework. We utilise multiple generations (responses) from a teacher for each prompt and assign rewards based on rule-based verifiers. These varying and normally distributed rewards serve as weights when training student models. Our methods and their subsequent behavioural analysis demonstrate a significant improvement in student model performance for mathematical and complex reasoning tasks, showcasing the efficacy and benefits of incorporating a rewarding mechanism in dataset distillation processes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes