AIApr 30

The Two Boundaries: Why Behavioral AI Governance Fails Structurally

arXiv:2604.2729232.76 citations

Predicted impact top 85% in AI · last 90 daysOriginality Highly original

AI Analysis

For AI safety researchers and system architects, the paper provides a formal proof that behavioral governance layers are structurally inadequate, shifting the problem from policy design to system architecture.

The paper identifies a structural failure in behavioral AI governance: the gap between what an AI system can do and what governance covers creates ungoverned capabilities and theater. It proves via Rice's theorem that this gap is undecidable for Turing-complete systems, and proposes coterminous governance—requiring architectural separation of computation from effects—as the only way to eliminate it, with mechanized proofs in Coq.

Every system that performs effects has two boundaries: what it can do (expressiveness) and what governance covers (governance). In nearly all deployed AI systems, these boundaries are defined independently, creating three regions: governed capabilities (the only useful region), ungoverned capabilities (risk), and governance policies that address non-existent capabilities (theater). Two of the three regions are failure modes. We focus on the governance of effects: actions that AI systems perform in the world (API calls, database writes, tool invocations). This is distinct from the governance of model outputs (content quality, bias, fairness), which operates at a different level and requires different mechanisms. We present a formal framework for analyzing this structural gap. Rice's theorem (1953) proves the gap is undecidable in the general case for any Turing-complete architecture that attempts to govern effects behaviorally: no algorithm can decide non-trivial semantic properties of arbitrary programs, including the property "this program's effects comply with the governance policy." We define coterminous governance: a system property where the expressivenessboundary equals the governance boundary. We show that coterminous governance requires an architectural decision (separatingcomputation from effect) rather than a governance layer added after the fact. We show that structural governance under this separation subsumes separate governance infrastructure: governance checks become part of the execution pipeline rather than a second system running alongside it. We propose coterminous governance as the testable criterion for any AI governance system: either the two boundaries are provably identical, or risk and theater are structurally inevitable. Proofs are mechanized in Coq (454 theorems, 36 modules, 0 admitted).

View on arXiv PDF

Similar