CRAIApr 14

Parallax: Why AI Agents That Think Must Never Act

arXiv:2604.1298656.51 citationsHas Code
Predicted impact top 34% in CR · last 90 daysOriginality Highly original
AI Analysis

For developers and deployers of autonomous AI agents, this work addresses a critical security gap by providing an architectural solution that remains effective even when the reasoning system is compromised.

The paper identifies that prompt-level guardrails are insufficient for autonomous AI agents that execute real-world actions, and introduces Parallax, a paradigm with four principles (Cognitive-Executive Separation, Adversarial Validation with Graduated Determinism, Information Flow Control, Reversible Execution) and an open-source implementation. In evaluations, Parallax blocks 98.9% of attacks (default) and 100% (maximum security) across 280 adversarial test cases, with zero false positives.

Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with projections that 80% of enterprise applications will embed AI copilots by the end of 2026. As agents gain the ability to execute real-world actions (reading files, running commands, making network requests, modifying databases), a fundamental security gap has emerged. The dominant approach to agent safety relies on prompt-level guardrails: natural language instructions that operate at the same abstraction level as the threats they attempt to mitigate. This paper argues that prompt-based safety is architecturally insufficient for agents with execution capability and introduces Parallax, a paradigm for safe autonomous AI execution grounded in four principles: Cognitive-Executive Separation, which structurally prevents the reasoning system from executing actions; Adversarial Validation with Graduated Determinism, which interposes an independent, multi-tiered validator between reasoning and execution; Information Flow Control, which propagates data sensitivity labels through agent workflows to detect context-dependent threats; and Reversible Execution, which captures pre-destructive state to enable rollback when validation fails. We present OpenParallax, an open-source reference implementation in Go, and evaluate it using Assume-Compromise Evaluation, a methodology that bypasses the reasoning system entirely to test the architectural boundary under full agent compromise. Across 280 adversarial test cases in nine attack categories, Parallax blocks 98.9% of attacks with zero false positives under its default configuration, and 100% of attacks under its maximum-security configuration. When the reasoning system is compromised, prompt-level guardrails provide zero protection because they exist only within the compromised system; Parallax's architectural boundary holds regardless.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes