SE AI CL LGFeb 25

SWE-Protégé: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents

Patrick Tser Jern Kon, Archana Pradeep, Ang Chen, Alexander P. Ellis, Warren Hunt, Zijian Wang, John Yang, Samuel Thompson

arXiv:2602.22124v12.9h-index: 29

Originality Highly original

AI Analysis

This addresses the challenge of making SLMs more effective for software engineering tasks, offering a cost-efficient alternative to large models, though it is incremental as it builds on existing collaboration and fine-tuning methods.

The paper tackles the problem of small language models (SLMs) underperforming on long-horizon software engineering tasks like SWE-bench due to action looping and low resolution rates, and introduces SWE-Protégé, a post-training framework that enables SLMs to selectively collaborate with an expert model, achieving a 42.4% Pass@1 on SWE-bench Verified, a 25.4% improvement over prior SLM state-of-the-art.

Small language models (SLMs) offer compelling advantages in cost, latency, and adaptability, but have so far lagged behind larger models on long-horizon software engineering tasks such as SWE-bench, where they suffer from pervasive action looping and low resolution rates. We introduce SWE-Protégé, a post-training framework that reframes software repair as an expert-protégé collaboration problem. In SWE-Protégé, an SLM remains the sole decision-maker while learning to selectively seek guidance from a strong expert model, recognize stalled states, and follow through on expert feedback. Our approach combines supervised fine-tuning on expert-augmented trajectories with agentic reinforcement learning that explicitly discourages degenerative looping and unproductive expert collaboration. We lightly post-train Qwen2.5-Coder-7B-Instruct to achieve 42.4% Pass@1 on SWE-bench Verified, a +25.4% improvement over the prior SLM state of the art, while using expert assistance sparsely (~4 calls per task and 11% of total tokens).

View on arXiv PDF

Similar