LGDIS-NNDATA-ANMLJun 25, 2025

Lost in Retraining: Roaming the Parameter Space of Exponential Families Under Closed-Loop Learning

arXiv:2506.20623v2h-index: 1
Originality Incremental advance
AI Analysis

This addresses the risk of bias amplification in AI systems that train on self-generated data, which is incremental as it builds on existing exponential family theory.

The paper tackles the problem of closed-loop learning in exponential families, where models are repeatedly estimated from their own generated data, showing that maximum likelihood estimation leads to convergence to absorbing states that amplify initial biases, but this can be prevented with methods like maximum a posteriori estimation or regularization.

Closed-loop learning is the process of repeatedly estimating a model from data generated from the model itself. It is receiving great attention due to the possibility that large neural network models may, in the future, be primarily trained with data generated by artificial neural networks themselves. We study this process for models that belong to exponential families, deriving equations of motions that govern the dynamics of the parameters. We show that maximum likelihood estimation of the parameters endows sufficient statistics with the martingale property and that as a result the process converges to absorbing states that amplify initial biases present in the data. However, we show that this outcome may be prevented if the data contains at least one data point generated from a ground truth model, by relying on maximum a posteriori estimation or by introducing regularisation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes