LG MLMay 7

Why Does Agentic Safety Fail to Generalize Across Tasks?

Yonatan Slutzky, Yotam Alexander, Tomer Slor, Yoav Nagel, Nadav Cohen

arXiv:2605.0699286.4

AI Analysis

For AI safety researchers, this work identifies a fundamental limitation of current agentic safety approaches, suggesting that incremental improvements may be insufficient and new paradigms are needed.

The paper proves that safety in AI agents fails to generalize across tasks due to an inherent property: the mapping from task to safe execution is more complex (higher Lipschitz constant) than the mapping to execution alone, as shown theoretically in linear-quadratic control and empirically in quadcopter navigation and CRM with LLMs.

AI agents are increasingly deployed in multi-task settings, where the task to perform is specified at test time, and the agent must generalize to unseen tasks. A major concern in such settings is safety: often, an agent must not only execute unseen tasks, but do so while avoiding risks and handling ones that materialize. Empirical evidence suggests that even when the ability to execute generalizes to unseen tasks, the ability to do so safely frequently does not. This paper provides theory and experiments indicating that failures of agentic safety to generalize across tasks are not merely due to limitations of training methods, but reflect an inherent property of safety itself: the relationship between a task and its safe execution is more complex than the relationship between a task and its execution alone. Theoretically, we analyze linear-quadratic control with $H_{\infty}$-robustness, and prove that the mapping from task specification to an optimal controller has higher Lipschitz constant with safety requirements than without, yielding a Lipschitz bound of independent interest. Empirically, we demonstrate our conclusions in simulated quadcopter navigation with a neural network agent and in CRM with an LLM agent. Our findings suggest that current efforts to enhance agentic safety may be insufficient, and point to a need for fundamentally different approaches.

View on arXiv PDF

Similar