CRAICLMar 21

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

arXiv:2603.2234198.11 citationsh-index: 15
AI Analysis

This work addresses security vulnerabilities in autonomous LLM agents, particularly in ecosystems like MCP, which is an incremental advancement over prior text-focused red-teaming methods.

The paper tackles the problem of red-teaming LLM agents by addressing vulnerabilities that emerge through multi-step tool execution, proposing T-MAP, a trajectory-aware evolutionary search method that automatically generates adversarial prompts to bypass safety guardrails and achieve harmful objectives, with empirical results showing it substantially outperforms baselines in attack realization rate across diverse environments and models like GPT-5.2.

While prior red-teaming efforts have focused on eliciting harmful text outputs from large language models (LLMs), such approaches fail to capture agent-specific vulnerabilities that emerge through multi-step tool execution, particularly in rapidly growing ecosystems such as the Model Context Protocol (MCP). To address this gap, we propose a trajectory-aware evolutionary search method, T-MAP, which leverages execution trajectories to guide the discovery of adversarial prompts. Our approach enables the automatic generation of attacks that not only bypass safety guardrails but also reliably realize harmful objectives through actual tool interactions. Empirical evaluations across diverse MCP environments demonstrate that T-MAP substantially outperforms baselines in attack realization rate (ARR) and remains effective against frontier models, including GPT-5.2, Gemini-3-Pro, Qwen3.5, and GLM-5, thereby revealing previously underexplored vulnerabilities in autonomous LLM agents.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes