CR AIMay 29, 2025

Securing AI Agents with Information-Flow Control

Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin

arXiv:2505.23643v236.381 citationsh-index: 22Has Code

Originality Incremental advance

AI Analysis

This addresses security for AI agents, particularly in autonomous systems, but appears incremental as it builds on existing IFC methods.

This paper tackles the problem of securing AI agents against vulnerabilities like prompt injection by using information-flow control (IFC) to provide security guarantees. The result is Fides, a planner that enforces security policies and enables completion of a broad range of tasks with security guarantees, as evaluated in AgentDojo.

As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents. We present a formal model to reason about the security and expressiveness of agent planners. Using this model, we characterize the class of properties enforceable by dynamic taint-tracking and construct a taxonomy of tasks to evaluate security and utility trade-offs of planner designs. Informed by this exploration, we present Fides, a planner that tracks confidentiality and integrity labels, deterministically enforces security policies, and introduces novel primitives for selectively hiding information. Its evaluation in AgentDojo demonstrates that this approach enables us to complete a broad range of tasks with security guarantees. A tutorial to walk readers through the the concepts introduced in the paper can be found at https://github.com/microsoft/fides

View on arXiv PDF Code

Similar