CR AIMar 24, 2025

Defeating Prompt Injections by Design

Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, Florian Tramèr

DeepMindETH Zurich

arXiv:2503.18813v2144 citationsh-index: 35Has Code

Originality Incremental advance

AI Analysis

This addresses security vulnerabilities in LLM agents for developers and users, representing an incremental improvement with a specific defense mechanism.

The paper tackles the problem of prompt injection attacks in LLM agents by proposing CaMeL, a defense that creates a protective system layer to secure LLMs when handling untrusted data, achieving 77% task success with provable security compared to 84% in an undefended system.

Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an untrusted environment. However, LLM agents are vulnerable to prompt injection attacks when handling untrusted data. In this paper we propose CaMeL, a robust defense that creates a protective system layer around the LLM, securing it even when underlying models are susceptible to attacks. To operate, CaMeL explicitly extracts the control and data flows from the (trusted) query; therefore, the untrusted data retrieved by the LLM can never impact the program flow. To further improve security, CaMeL uses a notion of a capability to prevent the exfiltration of private data over unauthorized data flows by enforcing security policies when tools are called. We demonstrate effectiveness of CaMeL by solving $77\%$ of tasks with provable security (compared to $84\%$ with an undefended system) in AgentDojo. We release CaMeL at https://github.com/google-research/camel-prompt-injection.

View on arXiv PDF Code

Similar