LGMLJun 12, 2023

Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction

arXiv:2306.07221v124 citationsh-index: 40
Originality Incremental advance
AI Analysis

This work addresses the gap in prior analyses for MFLD, which assumed idealized conditions, by providing a general framework for convergence under real-world computational constraints, making it significant for researchers in optimization and machine learning dealing with distribution-dependent dynamics.

The paper tackles the challenge of analyzing mean-field Langevin dynamics (MFLD) under practical constraints like finite particles, time discretization, and stochastic gradients, establishing quantitative convergence rates to global optimal solutions for problems such as neural networks and MMD minimization, achieving improved rates for SGD and SVRG in standard Langevin dynamics.

The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift, and it naturally arises from the optimization of two-layer neural networks via (noisy) gradient descent. Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures. However, all prior analyses assumed the infinite-particle or continuous-time limit, and cannot handle stochastic gradient updates. We provide an general framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and stochastic gradient approximation. To demonstrate the wide applicability of this framework, we establish quantitative convergence rate guarantees to the regularized global optimal solution under (i) a wide range of learning problems such as neural network in the mean-field regime and MMD minimization, and (ii) different gradient estimators including SGD and SVRG. Despite the generality of our results, we achieve an improved convergence rate in both the SGD and SVRG settings when specialized to the standard Langevin dynamics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes