LGAIMar 11, 2025

Convergence Dynamics and Stabilization Strategies of Co-Evolving Generative Models

arXiv:2503.08117v1h-index: 4
Originality Incremental advance
AI Analysis

This addresses a critical issue for AI systems in real-world applications like social media, where models interact iteratively, though it is incremental as it builds on prior work on self-consuming models.

The paper tackles the problem of model collapse in co-evolving generative models, such as those in multimodal AI ecosystems, by analyzing their convergence dynamics and showing that mutual reinforcement accelerates collapse, but stabilization strategies like random corpus or user-content injections can prevent it while preserving diversity and fidelity.

The increasing prevalence of synthetic data in training loops has raised concerns about model collapse, where generative models degrade when trained on their own outputs. While prior work focuses on this self-consuming process, we study an underexplored yet prevalent phenomenon: co-evolving generative models that shape each other's training through iterative feedback. This is common in multimodal AI ecosystems, such as social media platforms, where text models generate captions that guide image models, and the resulting images influence the future adaptation of the text model. We take a first step by analyzing such a system, modeling the text model as a multinomial distribution and the image model as a conditional multi-dimensional Gaussian distribution. Our analysis uncovers three key results. First, when one model remains fixed, the other collapses: a frozen image model causes the text model to lose diversity, while a frozen text model leads to an exponential contraction of image diversity, though fidelity remains bounded. Second, in fully interactive systems, mutual reinforcement accelerates collapse, with image contraction amplifying text homogenization and vice versa, leading to a Matthew effect where dominant texts sustain higher image diversity while rarer texts collapse faster. Third, we analyze stabilization strategies implicitly introduced by real-world external influences. Random corpus injections for text models and user-content injections for image models prevent collapse while preserving both diversity and fidelity. Our theoretical findings are further validated through experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes