AIApr 17

Know When to Trust the Skill: Delayed Appraisal and Epistemic Vigilance for Single-Agent LLMs

arXiv:2604.1675356.3h-index: 7

Predicted impact top 66% in AI · last 90 daysOriginality Incremental advance

AI Analysis

For LLM-based autonomous agents, this work addresses the bottleneck of metacognitive control, offering a preliminary framework to improve reliability and trustworthiness.

LLMs as autonomous agents suffer from context pollution and overthinking due to lack of metacognitive governance. MESA-S, a framework introducing delayed appraisal and epistemic vigilance, reduces unnecessary reasoning loops and prevents confidence inflation in single-agent orchestration.

As large language models (LLMs) transition into autonomous agents integrated with extensive tool ecosystems, traditional routing heuristics increasingly succumb to context pollution and "overthinking". We argue that the bottleneck is not a deficit in algorithmic capability or skill diversity, but the absence of disciplined second-order metacognitive governance. In this paper, our scientific contribution focuses on the computational translation of human cognitive control - specifically, delayed appraisal, epistemic vigilance, and region-of-proximal offloading - into a single-agent architecture. We introduce MESA-S (Metacognitive Skills for Agents, Single-agent), a preliminary framework that shifts scalar confidence estimation into a vector separating self-confidence (parametric certainty) from source-confidence (trust in retrieved external procedures). By formalizing a delayed procedural probe mechanism and introducing Metacognitive Skill Cards, MESA-S decouples the awareness of a skill's utility from its token-intensive execution. Evaluated under an In-Context Static Benchmark Evaluation natively executed via Gemini 3.1 Pro, our early results suggest that explicitly programming trust provenance and delayed escalation mitigates supply-chain vulnerabilities, prunes unnecessary reasoning loops, and prevents offloading-induced confidence inflation. This architecture offers a scientifically cautious, behaviorally anchored step toward reliable, epistemically vigilant single-agent orchestration.

View on arXiv PDF

Similar