MLLGPRJan 25, 2022

Convex Analysis of the Mean Field Langevin Dynamics

arXiv:2201.10469v285 citations
AI Analysis

This work addresses the theoretical convergence properties of gradient descent in mean field neural networks, offering incremental insights by adapting existing techniques.

The paper analyzes the convergence rate of mean field Langevin dynamics for infinitely wide neural networks, providing a concise theory that parallels classical convex optimization results and enables efficient empirical evaluation.

As an example of the nonlinear Fokker-Planck equation, the mean field Langevin dynamics recently attracts attention due to its connection to (noisy) gradient descent on infinitely wide neural networks in the mean field regime, and hence the convergence property of the dynamics is of great theoretical interest. In this work, we give a concise and self-contained convergence rate analysis of the mean field Langevin dynamics with respect to the (regularized) objective function in both continuous and discrete time settings. The key ingredient of our proof is a proximal Gibbs distribution $p_q$ associated with the dynamics, which, in combination with techniques in [Vempala and Wibisono (2019)], allows us to develop a simple convergence theory parallel to classical results in convex optimization. Furthermore, we reveal that $p_q$ connects to the duality gap in the empirical risk minimization setting, which enables efficient empirical evaluation of the algorithm convergence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes