When Do Local Score Models Extrapolate Across Size? A Diagnostic Theory and Benchmark

Wenjie Xi

arXiv:2606.09705v16.0

Originality Incremental advance

AI Analysis

This work provides a diagnostic theory and benchmark for understanding when local score models can extrapolate across system sizes, which is crucial for scientific generative modeling.

The paper identifies that architectural locality alone does not guarantee stable size extrapolation in generative models; instead, stable extrapolation depends on the quasi-locality of the Gaussian-smoothed score. They formalize this mechanism and introduce a diagnostic benchmark (FDLF) to validate the interplay between spatial mixing, score locality, and model receptive fields.

Scientific generative modeling often requires size transfer, where models trained on small systems are evaluated on larger ones. While translation-invariant architectures enable this evaluation, we show that architectural locality alone does not guarantee stable size extrapolation. Instead, stable extrapolation is governed by the quasi-locality of the Gaussian-smoothed score. Through Tweedie's formula, far-away perturbations can influence local score components via posterior covariance, meaning a local model succeeds only if its receptive field covers the smoothed score's response range. We formalize this mechanism, proving a size-uniform comparison theorem for local marginals under reverse diffusion. We also introduce Finite-Depth Local Flow (FDLF), a white-box diagnostic benchmark with exact scores, densities, and controllable response ranges. Empirically, we validate the interplay between spatial mixing, smoothed-score quasi-locality, and model receptive fields. Under spatial mixing, the smoothed score remains quasi-local relative to the receptive field, enabling stable extrapolation. Conversely, when spatial mixing weakens, the score's locality rapidly degrades, causing size transfer to fail.

View on arXiv PDF

Similar