Agentic Microphysics: A Manifesto for Generative AI Safety
For AI safety researchers, this provides a new paradigm to address emergent risks from multi-agent systems, which current single-agent or aggregate approaches cannot capture.
This paper proposes a methodological framework for AI safety that shifts analysis from isolated models to population-level risks arising from structured agent interactions. It introduces 'agentic microphysics' and 'generative safety' to link local interaction dynamics to collective outcomes, enabling causal explanation and intervention design.
This paper advances a methodological proposal for safety research in agentic AI. As systems acquire planning, memory, tool use, persistent identity, and sustained interaction, safety can no longer be analysed primarily at the level of the isolated model. Population-level risks arise from structured interaction among agents, through processes of communication, observation, and mutual influence that shape collective behaviour over time. As the object of analysis shifts, a methodological gap emerges. Approaches focused either on single agents or on aggregate outcomes do not identify the interaction-level mechanisms that generate collective risks or the design variables that control them. A framework is required that links local interaction structure to population-level dynamics in a causally explicit way, allowing both explanation and intervention. We introduce two linked concepts. Agentic microphysics defines the level of analysis: local interaction dynamics where one agent's output becomes another's input under specific protocol conditions. Generative safety defines the methodology: growing phenomena and elicit risks from micro-level conditions to identify sufficient mechanisms, detect thresholds, and design effective interventions.