Atomic-Probe Governance for Skill Updates in Compositional Robot Policies
For roboticists deploying compositional skill libraries, this work provides the first principled method to govern skill updates without full revalidation, though the effect is demonstrated on limited tasks and the probe's performance varies across settings.
The paper introduces a paired-sampling cross-version swap protocol to study how skill updates affect composition outcomes in robot policies, discovering a dominant-skill effect where one skill's success rate (86.7%) dominates others (≤26.7%), shifting composition success by up to +50pp. They propose an atomic-quality probe and Hybrid Selector that achieves 64.6% oracle match at zero cost (vs 87.5% full revalidation), closing the gap to ~12pp at 46% cost.
Skill libraries in deployed robotic systems are continually updated through fine-tuning, fresh demonstrations, or domain adaptation, yet existing typed-composition methods (BLADE, SymSkill, Generative Skill Chaining) treat the library as frozen at test time and do not analyze how composition outcomes change when a skill is replaced. We introduce a paired-sampling cross-version swap protocol on robosuite manipulation tasks to characterize this dimension of compositional skill learning. On a dual-arm peg-in-hole task we discover a dominant-skill effect: one ECM achieves 86.7% atomic success rate while every other ECM is at or below 26.7%, and whether this dominant ECM enters a composition shifts the success rate by up to +50pp. We characterize the boundary on a simpler pick task where all atomic policies saturate at 100% and the effect is undefined. Across three tasks we further find that off-policy behavioral distance metrics fail to identify the dominant ECM, ruling out the natural cheap predictor. We propose an atomic-quality probe and a Hybrid Selector combining per-skill probes (zero per-decision cost) with selective composition revalidation (full cost), and characterize its Pareto frontier on 144 skill-update decisions. On T6 the atomic-only probe sits 23pp below full revalidation (64.6% vs 87.5% oracle match) at zero per-decision cost; a Hybrid Selector with m=10 closes most of that gap to ~12pp at 46% of full-revalidation cost. On the cross-task average over 144 events, atomic-only is within 3pp of full revalidation under a mixed-oracle caveat. The atomic-quality probe is, to our knowledge, the first principled, deployment-ready primitive for skill-update governance in compositional robot policies.