LGMay 19

Adynamical systems view of training generativemodels and the memorization phenomenon

arXiv:2605.1948356.5
Predicted impact top 37% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For machine learning theorists, this offers a novel theoretical perspective on memorization, but it is primarily a conceptual framework without empirical validation or quantitative results.

This paper provides a dynamical systems explanation for the memorization phenomenon in generative models, using a stylized loss function with two time scales to analyze constant step-size SGD. It shows how this perspective links memorization to collapse and double descent phenomena.

Using recent works of one of the authors (VSB) on collapse in generative models and two time scale dynamics in stochastic gradient descent in high dimensions, we give a system theoretic explanation of the memorization phenomenon in generative models. This relies purely on the dynamic aspects of the training phase. Specifically, we use a result of Austin [2016] to motivate a stylized model for the loss function for stochastic gradient descent (SGD) wherein the loss function has a strong dependence on some variables and weak dependence on the rest in a precise sense. This naturally leads to two distinct time scales in the constant step size SGD that is commonly used in machine learning. This fact has been used to explain the double descent phenomenon in SGD in Borkar [2026]. In conjunction with a mathematical model for collapse phenomenon in SGD developed in Borkar [2025a], we analyze the constant step size SGD using the recent results of Azizian et al. [2024] in order to explain the phenomenon of memorization wherein a generative model that is concurrently being tuned yields the same or similar outputs for significant stretches of time. This gives a novel perspective on the aforementioned phenomena reported in machine learning literature and their interrelationships, using a dynamical systems viewpoint.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes