A Structural Threshold in Decision Capacity Governs Collapse in Self-Play Reinforcement Learning
Identifies a structural threshold governing collapse in self-play RL, relevant for multi-agent systems and safe AI.
Self-play reinforcement learning agents collapse to near-maximal loss when all positive-reach contingent decisions are eliminated, but preserving even one such decision prevents collapse. The phenomenon is timing-invariant, reversible, and intensifies with function approximation.
We show that a threshold in decision capacity determines whether self-play reinforcement learning agents collapse under asymmetric rule perturbations. Across poker variants, matrix games, a dice game, and multiple learning algorithms, eliminating all positive-reach contingent decisions causes rapid convergence to a deterministic exploitation attractor, a fixed point at near-maximal loss. Preserving even a single positive-reach contingent decision point prevents this collapse. A frozen baseline and fixed-opponent control confirm that the mechanism is co-adaptation under constraint, not the perturbation itself. The phenomenon is timing-invariant, fully reversible upon action restoration, and intensifies under function approximation. These results establish a sharp threshold at zero reach-weighted contingent action capacity, with severity scaling continuously via reach-weighted capacity in the tested domains.