Agent Security is a Systems Problem
For AI safety researchers and developers, it reframes agent security from a model-centric to a systems-centric approach, highlighting the insufficiency of model robustness alone.
The paper argues that AI agent security must be treated as a systems problem, treating the model as untrusted and enforcing security invariants at the system level, and shows through analysis of 11 real-world attacks that systems principles could have prevented them.
We take the position that agent security must be approached as a systems problem: the AI model powering the agent must be treated as an untrusted component, and security invariants must be enforced at the system level. Through this lens, efforts to increase model robustness (the dominant viewpoint in the community) are insufficient on their own. Instead, we must complement existing efforts with techniques from the systems security domain. Based on our experience as cybersecurity researchers in operating systems, networks, formal methods, and adversarial machine learning, we articulate a set of core principles, grounded in decades of systems security research, that provide a foundation for designing agentic systems with predictable guarantees. As evidence, we analyze eleven representative real-world attacks on agents and discuss how systems principles, if realized, could have prevented these attacks. We also identify the research challenges that stand in the way of implementing these principles in agents.