Radha Sarma

1paper

1 Paper

AIFeb 26
Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive

Radha Sarma

AI systems are increasingly deployed in high-stakes contexts (medical diagnosis, legal research, financial analysis) under the assumption they can be governed by norms. This paper demonstrates that the assumption is formally invalid for optimization-based systems, specifically Large Language Models trained via Reinforcement Learning from Human Feedback (RLHF). Genuine agency requires two necessary and jointly sufficient architectural conditions. First, the capacity to maintain certain boundaries as non-negotiable constraints rather than tradeable weights (Incommensurability). Second, a non-inferential mechanism capable of suspending processing when those boundaries are threatened (Apophatic Responsiveness). RLHF-based systems are constitutively incompatible with both conditions. The operations that make optimization powerful, unifying all values on a scalar metric and always selecting the highest-scoring output, are precisely the operations that preclude normative governance and agency. This incompatibility is not a correctable training bug awaiting a technical fix. It is a formal constraint inherent to what optimization is. Consequently, documented failure modes (sycophancy, hallucination, and unfaithful reasoning) are not accidents but expected structural manifestations. Misaligned deployment triggers a second-order risk termed the Convergence Crisis. When humans are forced to verify AI outputs under metric pressure, they degrade from genuine agents into criteria-checking optimizers, eliminating the only component capable of bearing normative accountability. Beyond the incompatibility proof, this paper's primary positive contribution is a substrate-neutral architectural specification deriving what any system (biological, artificial, or institutional) must necessarily satisfy to qualify as a genuine agent rather than a sophisticated instrument.