AIMay 9

Evidence Over Plans: Online Trajectory Verification for Skill Distillation

arXiv:2605.0919293.9Has Code
AI Analysis

For researchers and practitioners in skill distillation and autonomous agents, this work provides a principled metric and method to ensure skills are empirically grounded, addressing a key bottleneck in skill generation.

The paper identifies that robust skills should be grounded in environment interaction rather than prior plans, and introduces the Posterior Distillation Index (PDI) to quantify skill grounding. SPARK, which uses PDI for online verification, generates skills that outperform no-skill baselines and human-written skills on student models with up to 1,000x cheaper inference cost across 86 tasks.

Agent skills can remarkably improve task success rates by using human-written procedural documents, but their quality is difficult to assess without environment-grounded verification. Existing skill generation methods heavily rely on preference logs rather than direct environment interaction, often yielding negligible or even degraded gains. We identify that it is a fundamental timing bottleneck: robust skills should be posterior-based, distilled from empirical environment interaction rather than prior plans. In this study, we introduce the Posterior Distillation Index (PDI), a trajectory-level metric that quantifies how well a distilled skill is grounded in the task-environment evidence. To operationalize PDI, we present SPARK (Structured Pipelines for Autonomous Runnable tasKs and sKill generation) for preserving task execution evidence towards full trajectory-level analysis. SPARK generates environment-verified trajectories used to compute PDI, and it applies PDI as an online diagnostic and intervention signal to ensure posterior skill formation. Across 86 runnable tasks, SPARK-generated skills consistently surpass no-skill baselines and outperform human-written skills on student models (inference cost up to 1,000x cheaper than teacher models). These findings show that PDI-guided distillation produces efficient and transferable skills grounded in the task-environment interaction. We release our code at https://github.com/EtaYang10th/spark-skills .

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes