LGMar 10

Generative Drifting is Secretly Score Matching: a Spectral and Variational Perspective

arXiv:2603.09936v19.99 citations

Predicted impact top 17% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work provides theoretical insights for researchers in generative modeling, addressing empirical gaps in drifting methods, though it is incremental in building on existing score-matching frameworks.

The paper tackles the theoretical foundations of generative modeling via drifting, showing that under a Gaussian kernel, the drift operator is exactly a score difference on smoothed distributions, which resolves key open questions and enables a spectral analysis revealing frequency-dependent convergence bottlenecks and an exponential bandwidth annealing schedule that reduces convergence time from exponential to logarithmic.

Generative Modeling via Drifting has recently achieved state-of-the-art one-step image generation through a kernel-based drift operator, yet the success is largely empirical and its theoretical foundations remain poorly understood. In this paper, we make the following observation: \emph{under a Gaussian kernel, the drift operator is exactly a score difference on smoothed distributions}. This insight allows us to answer all three key questions left open in the original work: (1) whether a vanishing drift guarantees equality of distributions ($V_{p,q}=0\Rightarrow p=q$), (2) how to choose between kernels, and (3) why the stop-gradient operator is indispensable for stable training. Our observations position drifting within the well-studied score-matching family and enable a rich theoretical perspective. By linearizing the McKean-Vlasov dynamics and probing them in Fourier space, we reveal frequency-dependent convergence timescales comparable to \emph{Landau damping} in plasma kinetic theory: the Gaussian kernel suffers an exponential high-frequency bottleneck, explaining the empirical preference for the Laplacian kernel. We also propose an exponential bandwidth annealing schedule $σ(t)=σ_0 e^{-rt}$ that reduces convergence time from $\exp(O(K_{\max}^2))$ to $O(\log K_{\max})$. Finally, by formalizing drifting as a Wasserstein gradient flow of the smoothed KL divergence, we prove that the stop-gradient operator is derived directly from the frozen-field discretization mandated by the JKO scheme, and removing it severs training from any gradient-flow guarantee. This variational perspective further provides a general template for constructing novel drift operators, demonstrated with a Sinkhorn divergence drift.

View on arXiv PDF

Similar