MATH-PH IT LG MLOct 12, 2022

Rigorous dynamical mean field theory for stochastic gradient descent methods

Cedric Gerbelot, Emanuele Troiani, Francesca Mignacco, Florent Krzakala, Lenka Zdeborova

arXiv:2210.06591v312.637 citationsh-index: 60Has Code

Originality Incremental advance

AI Analysis

This provides a rigorous theoretical foundation for analyzing stochastic gradient descent methods in high-dimensional settings, which is incremental but important for understanding optimization in machine learning.

The authors derived closed-form equations for the high-dimensional asymptotics of first-order gradient-based methods, including SGD and Nesterov acceleration, applied to learning estimators from Gaussian data with empirical risk minimization, showing that these equations match those from discretized dynamical mean-field theory.

We prove closed-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator (e.g. M-estimator, shallow neural network, ...) from observations on Gaussian data with empirical risk minimization. This includes widely used algorithms such as stochastic gradient descent (SGD) or Nesterov acceleration. The obtained equations match those resulting from the discretization of dynamical mean-field theory (DMFT) equations from statistical physics when applied to gradient flow. Our proof method allows us to give an explicit description of how memory kernels build up in the effective dynamics, and to include non-separable update functions, allowing datasets with non-identity covariance matrices. Finally, we provide numerical implementations of the equations for SGD with generic extensive batch-size and with constant learning rates.

View on arXiv PDF Code

Similar