SkillGen: Verified Inference-Time Agent Skill Synthesis

Yuchen Ma, Yue Huang, Han Bao, Haomin Zhuang, Swadheen Shukla, Michel Galley, Xiangliang Zhang, Stefan Feuerriegel

arXiv:2605.1099945.92 citations

Predicted impact top 1% in LG · last 90 daysOriginality Highly original

AI Analysis

For LLM agent developers, SkillGen automates the creation of high-quality, reusable skills that improve agent performance without retraining, addressing the bottleneck of manual skill authoring.

SkillGen is a multi-agent framework that synthesizes auditable skills from agent trajectories using contrastive induction over successes and failures, and empirically verifies skill effects by comparing outcomes with and without the skill. It consistently improves held-out performance, outperforms existing baselines, and produces transferable skills.

Skills are a promising way to improve LLM agent capabilities without retraining, while keeping the added procedure reusable and controllable. However, high-quality skills are still largely written by hand. We introduce SkillGen, a multi-agent framework that synthesizes a single auditable skill from trajectories generated by a base agent. The output is a human-readable artifact that can be inspected before use. Rather than merely summarizing trajectories, SkillGen leverages contrastive induction over both successful and failed trajectories to identify reusable success patterns, recurring failure modes, and behaviors that appear in nearby successes but are missing from failures. SkillGen then generates candidate skills and iteratively refines the skill. A key novelty in SkillGen is that we model agent skills as interventions to empirically verify the net effect of skills on the overall performance. Specifically, we compare outcomes on the same instances with and without the skill, so that we account for both repairs (cases where the skill fixes a baseline failure) and regressions (cases where the skill breaks a baseline success). Across a broad range of agents and datasets, SkillGen consistently improves held-out performance, outperforms existing skill-generation baselines, and produces skills that transfer across models.

View on arXiv PDF

Similar