LGAINov 16, 2025

The Alignment Game: A Theory of Long-Horizon Alignment Through Recursive Curation

arXiv:2511.12804v11 citations
Originality Highly original
AI Analysis

This work addresses the foundational challenge of maintaining alignment in AI systems that train on their own outputs, which is critical for developers and policymakers, but it is incremental as it builds on existing models like Bradley-Terry.

The paper tackles the problem of long-term alignment in self-consuming generative models by analyzing recursive retraining under a two-stage curation mechanism, revealing three convergence regimes and proving an impossibility theorem that no such mechanism can preserve diversity, ensure symmetric influence, and eliminate initialization dependence simultaneously.

In self-consuming generative models that train on their own outputs, alignment with user preferences becomes a recursive rather than one-time process. We provide the first formal foundation for analyzing the long-term effects of such recursive retraining on alignment. Under a two-stage curation mechanism based on the Bradley-Terry (BT) model, we model alignment as an interaction between two factions: the Model Owner, who filters which outputs should be learned by the model, and the Public User, who determines which outputs are ultimately shared and retained through interactions with the model. Our analysis reveals three structural convergence regimes depending on the degree of preference alignment: consensus collapse, compromise on shared optima, and asymmetric refinement. We prove a fundamental impossibility theorem: no recursive BT-based curation mechanism can simultaneously preserve diversity, ensure symmetric influence, and eliminate dependence on initialization. Framing the process as dynamic social choice, we show that alignment is not a static goal but an evolving equilibrium, shaped both by power asymmetries and path dependence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes