CVSep 11, 2025

Locality in Image Diffusion Models Emerges from Data Statistics

arXiv:2509.09672v221 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work provides insights into the fundamental mechanisms of diffusion models for researchers in generative AI, though it is incremental as it builds on prior locality findings.

The study tackled the problem of why image diffusion models learn local behavior by showing that locality emerges from pixel correlations in the dataset, not from neural network inductive biases, and demonstrated this with an analytical denoiser that better matches deep model scores than prior alternatives.

Recent work has shown that the generalization ability of image diffusion models arises from the locality properties of the trained neural network. In particular, when denoising a particular pixel, the model relies on a limited neighborhood of the input image around that pixel, which, according to the previous work, is tightly related to the ability of these models to produce novel images. Since locality is central to generalization, it is crucial to understand why diffusion models learn local behavior in the first place, as well as the factors that govern the properties of locality patterns. In this work, we present evidence that the locality in deep diffusion models emerges as a statistical property of the image dataset and is not due to the inductive bias of convolutional neural networks, as suggested in previous work. Specifically, we demonstrate that an optimal parametric linear denoiser exhibits similar locality properties to deep neural denoisers. We show, both theoretically and experimentally, that this locality arises directly from pixel correlations present in the image datasets. Moreover, locality patterns are drastically different on specialized datasets, approximating principal components of the data's covariance. We use these insights to craft an analytical denoiser that better matches scores predicted by a deep diffusion model than prior expert-crafted alternatives. Our key takeaway is that while neural network architectures influence generation quality, their primary role is to capture locality patterns inherent in the data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes