ROAIMay 18

Confidence-Gated Robot Autonomy: When Does Uncertainty Actually Help?

arXiv:2605.180451.3
AI Analysis

For roboticists designing selective autonomy systems, this work clarifies that uncertainty-based gating is only beneficial above a competence threshold and that threshold choice dominates outcomes.

The paper investigates when predictive uncertainty is useful for threshold-gated robot autonomy, finding that uncertainty ranking quality improves with model competence but simple proxies like softmax suffice for gating, while semantic OOD detection remains poor.

Robotic systems often use predictive uncertainty to decide whether to act autonomously or defer to a fallback policy. In threshold-gated autonomy, uncertainty matters mainly through its ability to rank likely errors. Standard metrics such as expected calibration error and AUROC do not directly test whether uncertainty changes act/defer decisions. We therefore evaluate uncertainty using Spearman rank correlation, paired bootstrap equivalence testing, and act/defer agreement. Across three temporal activity-recognition benchmarks, we find a dataset-dependent competence regime below which uncertainty provides a weak and unstable error ranking. Above this regime, softmax heuristics, MC Dropout, and ensembles produce similar gating behavior, while threshold choice has a much larger effect on execution outcomes. A multi-seed embodied simulation shows the same pattern for collision rate and cost once realized autonomy is matched. Under temporal covariate shift, ranking quality remains stable, but fine grained semantic OOD detection remains near chance. These results suggest that simple uncertainty proxies can suffice for selective gating once the base model is competent, but not for semantic novelty detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes