LGAPMLJun 4, 2021

Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis

arXiv:2106.02588v240 citations
Originality Synthesis-oriented
AI Analysis

This provides theoretical insights into optimization dynamics for machine learning practitioners, but it is incremental as it builds on prior continuous time models.

The paper tackles the problem of understanding how stochastic gradient descent (SGD) with machine learning-type noise selects minima in continuous time, showing that in a specific noise regime, it prefers 'flat' minima differently than SGD with homogeneous noise.

The representation of functions by artificial neural networks depends on a large number of parameters in a non-linear fashion. Suitable parameters of these are found by minimizing a 'loss functional', typically by stochastic gradient descent (SGD) or an advanced SGD-based algorithm. In a continuous time model for SGD with noise that follows the 'machine learning scaling', we show that in a certain noise regime, the optimization algorithm prefers 'flat' minima of the objective function in a sense which is different from the flat minimum selection of continuous time SGD with homogeneous noise.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes