SEAIMay 9

Skill Drift Is Contract Violation: Proactive Maintenance for LLM Agent Skill Libraries

arXiv:2605.1099084.8
Predicted impact top 11% in SE · last 90 daysOriginality Incremental advance
AI Analysis

For developers maintaining LLM agent skill libraries, this work provides a precision-first method to detect and repair skill drift caused by evolving external dependencies.

LLM agent skill libraries silently decay as external dependencies evolve. The authors formulate skill drift as contract violation and propose a method that extracts executable environment contracts from skill documents, achieving zero false alarms over 599 no-drift cases (Wilson 95% CI [0,0.6]%) and 100% precision with 76% recall in known-drift verification, while improving one-round repair success from 10% to 78%.

LLM agents increasingly rely on reusable skill libraries, but these skills silently decay as the external services, packages, APIs, and configurations they reference evolve. Existing monitors detect such changes at the wrong granularity: they observe values, not the role those values play in a skill. A version string in a comment is noise; the same string in a pinned dependency is an operational obligation. We formulate skill drift as contract violation and introduce \sgname{}, which extracts executable environment contracts from skill documents and validates only those role-bearing assumptions against known or live conditions. This distinction turns noisy monitoring into a precision-first maintenance signal. Contract-free CI probes produce 40\% false positives, while \sgname{} raises zero false alarms over 599 no-drift and hard-negative cases (Wilson 95\% CI $[0,0.6]\%$). In known-drift verification, \sgname{} achieves 100\% precision and 76\% recall with the strongest backbone; in a pre-registered study over 49 real skills, it discovers live drift with 86\% conservative precision. Violated contracts also make repair actionable, improving one-round success from 10\% without localization to 78\%. We release \dbname{}, an 880-pair benchmark for skill degradation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes