LG AI CYMay 27, 2025

What happens when generative AI models train recursively on each others' outputs?

arXiv:2505.21677v37.11 citationsh-index: 2

Originality Incremental advance

AI Analysis

This addresses a critical problem for society as reliance on generative AI grows, highlighting potential risks and benefits of recursive training.

The paper investigates the effects of generative AI models training on each other's outputs, finding that such data-mediated interactions can expose models to novel concepts but also homogenize their performance on shared tasks.

The internet serves as a common source of training data for generative AI (genAI) models but is increasingly populated with AI-generated content. This duality raises the possibility that future genAI models may be trained on other models' generated outputs. Prior work has studied consequences of models training on their own generated outputs, but limited work has considered what happens if models ingest content produced by other models. Given society's increasing dependence on genAI tools, understanding such data-mediated model interactions is critical. This work provides empirical evidence for how data-mediated interactions might unfold in practice, develops a theoretical model for this interactive training process, and experimentally validates the theory. We find that data-mediated interactions can benefit models by exposing them to novel concepts perhaps missed in original training data, but also can homogenize their performance on shared tasks.

View on arXiv PDF

Similar