CRMar 15

Governing Dynamic Capabilities: Cryptographic Binding and Reproducibility Verification for AI Agent Tool Use

arXiv:2603.1433214.71 citations

Predicted impact top 26% in CR · last 90 daysOriginality Incremental advance

AI Analysis

This addresses security and regulatory compliance for AI agents in dynamic environments, though it appears incremental by building on existing cryptographic and verification techniques.

The paper tackles the problem of AI agents silently changing capabilities after authorization (the capability-identity gap), which enables security risks and violates regulations like the EU AI Act. It proposes cryptographic binding, reproducibility verification, and a verifiable ledger, achieving 97us certificate verification, 0.62ms governance overhead per tool call, and detection of all 12 attack scenarios with zero false positives in evaluations.

AI agents dynamically acquire capabilities at runtime via MCP and A2A, yet no framework detects when capabilities change post-authorization. We term this the capability-identity gap}: it enables silent capability escalation and violates EU AI Act traceability requirements. We propose three mechanisms. Capability-bound agent certificates extend X.509 v3 with a skills manifest hash; any tool change invalidates the certificate. Reproducibility commitments leverage LLM inference near-determinism for post-hoc replay verification. A verifiable interaction ledger provides hash-linked, signed records for multi-agent forensic reconstruction. We formalize nine security properties and prove they hold under a realistic adversary model. Our Rust prototype achieves 97us certificate verification (<1ns capability binding overhead, ~1,200,000 faster than BAID's zkVM), 0.62ms total governance overhead per tool call (0.1--1.2% of typical latency), and 4.7X separation from cross-provider outputs (Cohen's d > 1.0 on all four metrics), with best classification at F_1=0.876 (Jaccard, Î¸=0.408); single-provider deployments achieve F_1=0.990 with 11.5 times separation. We evaluate 12 attack scenarios -- silent escalation, tool trojanization, phantom delegation, evidence tampering, collusion, and runtime behavioral attacks validated against NVIDIA's Nemotron-AIQ traces -- each detected with a traceable mechanism, while the MCP+OAuth 2.1 baseline detects none. An end-to-end evaluation over a 5-to-20-agent pipeline with real LLM calls confirms that full governance (G1--G3) adds ~10.8ms per pipeline run (0.12% overhead), scales sub-linearly per agent, and detects all five in-situ attacks with zero false positives.

View on arXiv PDF

Similar