MLLGJan 13, 2025

Gaussian Universality for Diffusion Models

arXiv:2501.07741v3h-index: 5IEEE Signal Processing Letters
Originality Synthesis-oriented
AI Analysis

This provides a theoretical foundation for evaluating models trained on diffusion-generated data, which is incremental as it extends universality results to a new data type.

The paper tackles the problem of analyzing model performance on synthetic data generated by diffusion models, showing that for linear classifiers, the test error depends only on the first and second order statistics, matching that of Gaussian mixtures with the same means and covariances.

We investigate Gaussian Universality for data distributions generated via diffusion models. By Gaussian Universality we mean that the test error of a generalized linear model $f(\mathbf{W})$ trained for a classification task on the diffusion data matches the test error of $f(\mathbf{W})$ trained on the Gaussian Mixture with matching means and covariances per class.In other words, the test error depends only on the first and second order statistics of the diffusion-generated data in the linear setting. As a corollary, the analysis of the test error for linear classifiers can be reduced to Gaussian data from diffusion-generated data. Analysing the performance of models trained on synthetic data is a pertinent problem due to the surge of methods such as \cite{sehwag2024stretchingdollardiffusiontraining}. Moreover, we show that, for any $1$- Lipschitz scalar function $φ$, $φ(\mathbf{x})$ is close to $\mathbb{E} φ(\mathbf{x})$ with high probability for $\mathbf{x}$ sampled from the conditional diffusion model corresponding to each class. Finally, we note that current approaches for proving universality do not apply to diffusion-generated data as the covariance matrices of the data tend to have vanishing minimum singular values, contrary to the assumption made in the literature. This leaves extending previous mathematical universality results as an intriguing open question.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes