LGMLJun 18, 2019

On the interplay between noise and curvature and its effect on optimization and generalization

arXiv:1906.07774v271 citations
Originality Synthesis-oriented
AI Analysis

This work addresses foundational issues in optimization theory for machine learning practitioners, though it appears incremental in clarifying existing concepts.

The paper investigates how the interaction between loss curvature and gradient noise affects optimization speed and generalization performance, clarifying distinctions between key matrices like the Fisher, Hessian, and gradient covariance.

The speed at which one can minimize an expected loss using stochastic methods depends on two properties: the curvature of the loss and the variance of the gradients. While most previous works focus on one or the other of these properties, we explore how their interaction affects optimization speed. Further, as the ultimate goal is good generalization performance, we clarify how both curvature and noise are relevant to properly estimate the generalization gap. Realizing that the limitations of some existing works stems from a confusion between these matrices, we also clarify the distinction between the Fisher matrix, the Hessian, and the covariance matrix of the gradients.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes