Stateful Agent Backdoor
For security researchers and developers of LLM-based agents, this work exposes a new, more persistent threat model that extends beyond single-session attacks.
The paper proposes a stateful backdoor attack on LLM-based agents that persists across multiple sessions under permission isolation, achieving 80-95% attack success rate across four models.
Existing backdoor attacks on Large Language Model-based agents remain stateless, executing fixed behaviors confined to a single session. We propose a stateful agent backdoor that extends the attack lifecycle across multiple sessions under permission isolation. The attack maintains state through persistent components, enabling autonomous, incremental execution across sessions following a one-time trigger injection. Formally, we model the attack as a Mealy machine and derive a decomposition framework that enables independent per-transition data construction. We instantiate this framework with a primary attack and two extensibility variants. The primary instantiation achieves an attack success rate of 80\%--95\% across four models, with per-transition analysis demonstrating the effectiveness of the decomposition. Extensibility variants with alternative topologies and persistent components demonstrate consistent effectiveness. Code and data are available at https://anonymous.4open.science/r/stateful_agent_backdoor-E89F.