AIFeb 18

AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks

arXiv:2602.16901v18 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses security risks for LLM agents deployed in practical, multi-turn settings, representing an incremental advancement in benchmarking.

The authors tackled the problem of LLM agents being vulnerable to long-horizon attacks in complex environments by introducing AgentLAB, a benchmark that evaluates agent susceptibility, finding that current agents remain highly susceptible and existing defenses are ineffective.

LLM agents are increasingly deployed in long-horizon, complex environments to solve challenging problems, but this expansion exposes them to long-horizon attacks that exploit multi-turn user-agent-environment interactions to achieve objectives infeasible in single-turn settings. To measure agent vulnerabilities to such risks, we present AgentLAB, the first benchmark dedicated to evaluating LLM agent susceptibility to adaptive, long-horizon attacks. Currently, AgentLAB supports five novel attack types including intent hijacking, tool chaining, task injection, objective drifting, and memory poisoning, spanning 28 realistic agentic environments, and 644 security test cases. Leveraging AgentLAB, we evaluate representative LLM agents and find that they remain highly susceptible to long-horizon attacks; moreover, defenses designed for single-turn interactions fail to reliably mitigate long-horizon threats. We anticipate that AgentLAB will serve as a valuable benchmark for tracking progress on securing LLM agents in practical settings. The benchmark is publicly available at https://tanqiujiang.github.io/AgentLAB_main.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes