CRAIMay 27

AIRGuard: Guarding Agent Actions with Runtime Authority Control

arXiv:2605.2891475.4h-index: 4Has Code
Predicted impact top 16% in CR · last 90 daysOriginality Incremental advance
AI Analysis

For developers of agentic AI systems, AIRGuard provides a runtime guard that operationalizes least privilege to prevent unauthorized tool-mediated side effects.

AIRGuard addresses authority confusion in tool-using language agents, where attacker-controlled context can authorize harmful actions. It reduces attack success from 36.3% to 5.5% on AgentTrap while preserving 76.0% benign utility on DTAP-150.

Tool-using language agents turn model decisions into external side effects: they read files, run scripts, call APIs, send messages, and invoke Model Context Protocol tools. This makes agent attacks different from jailbreaks. The harmful step is often not an obviously forbidden output, but an ordinary executable action that becomes unsafe because attacker-controlled context steers authorized access against the user's interest. We identify this failure mode as authority confusion: untrusted resources may inform reasoning, but they must not authorize side effects. We present AIRGuard, a runtime guard that operationalizes least privilege as action-time authorization. AIRGuard normalizes heterogeneous tool calls, derives task authority into step-level authority, tracks source and target trust, simulates sensitive side effects, audits cross-step risk, and enforces decisions before actions execute. On AgentTrap, AIRGuard reduces Sonnet 4.6 attack success from 36.3% without defense to 5.5%. On DTAP-150, AIRGuard preserves 76.0% benign utility with Haiku 4.5, compared with 52.0% for ARGUS and 42.0% for MELON. An ablation further shows that prompt-only policy helps only modestly, whereas a dedicated runtime authority-control layer gives the agent system direct control over tool-mediated side effects. Code and data are available at https://github.com/Sophie508/AIRGuard.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes