Safety is Non-Compositional: A Formal Framework for Capability-Based AI Systems
This addresses a critical safety issue for AI developers and policymakers by revealing a fundamental limitation in compositional safety approaches.
The paper tackles the problem of ensuring safety in AI systems by proving that safety is non-compositional, showing that two individually safe agents can combine to reach a forbidden goal through emergent conjunctive dependencies.
This paper contains the first formal proof that safety is non-compositional in the presence of conjunctive capability dependencies: two agents each individually inca- pable of reaching any forbidden capability can, when combined, collectively reach a forbidden goal through an emergent conjunctive dependency.