LGMLNov 30, 2021

The Geometric Occam's Razor Implicit in Deep Learning

arXiv:2111.15090v28 citations
Originality Incremental advance
AI Analysis

This addresses the theoretical understanding of implicit regularization in deep learning for researchers, but it is incremental as it builds on known concepts like Dirichlet energy.

The paper tackles the problem of understanding the properties of over-parameterized neural networks that fit training data exactly, arguing that stochastic gradient descent implicitly regularizes them via a Geometric Occam's Razor based on geometric model complexity, such as arc length or Dirichlet energy, and observes consistency with this in ResNets on CIFAR-10.

In over-parameterized deep neural networks there can be many possible parameter configurations that fit the training data exactly. However, the properties of these interpolating solutions are poorly understood. We argue that over-parameterized neural networks trained with stochastic gradient descent are subject to a Geometric Occam's Razor; that is, these networks are implicitly regularized by the geometric model complexity. For one-dimensional regression, the geometric model complexity is simply given by the arc length of the function. For higher-dimensional settings, the geometric model complexity depends on the Dirichlet energy of the function. We explore the relationship between this Geometric Occam's Razor, the Dirichlet energy and other known forms of implicit regularization. Finally, for ResNets trained on CIFAR-10, we observe that Dirichlet energy measurements are consistent with the action of this implicit Geometric Occam's Razor.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes