MACLNov 11, 2025

How Brittle is Agent Safety? Rethinking Agent Risk under Intent Concealment and Task Complexity

arXiv:2511.08487v1h-index: 21
Originality Highly original
AI Analysis

This addresses a critical gap in agent safety for AI researchers and developers, providing a foundational benchmark to improve robustness against advanced threats.

The paper tackles the problem that current safety evaluations for LLM-driven agents overlook sophisticated threats involving concealed malicious intent and complex tasks, revealing that safety alignment degrades sharply with intent concealment and a 'Complexity Paradox' arises where agents appear safer on harder tasks due to capability limitations.

Current safety evaluations for LLM-driven agents primarily focus on atomic harms, failing to address sophisticated threats where malicious intent is concealed or diluted within complex tasks. We address this gap with a two-dimensional analysis of agent safety brittleness under the orthogonal pressures of intent concealment and task complexity. To enable this, we introduce OASIS (Orthogonal Agent Safety Inquiry Suite), a hierarchical benchmark with fine-grained annotations and a high-fidelity simulation sandbox. Our findings reveal two critical phenomena: safety alignment degrades sharply and predictably as intent becomes obscured, and a "Complexity Paradox" emerges, where agents seem safer on harder tasks only due to capability limitations. By releasing OASIS and its simulation environment, we provide a principled foundation for probing and strengthening agent safety in these overlooked dimensions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes