ML DIS-NN STAT-MECH LGJun 19, 2020

An analytic theory of shallow networks dynamics for hinge loss classification

arXiv:2006.11209v114.719 citationsHas Code

Originality Incremental advance

AI Analysis

This work provides theoretical insights into neural network learning dynamics, addressing a fundamental gap in understanding for researchers in machine learning theory, though it is incremental as it builds on existing mean-field approaches.

The authors developed an analytic theory to understand the training dynamics of shallow neural networks for classification, specifically mapping a single hidden layer network to a single-node problem in a mean-field limit and solving it for linearly separable data with hinge loss, revealing phenomena like slowing down, crossover between learning regimes, and overfitting.

Neural networks have been shown to perform incredibly well in classification tasks over structured high-dimensional datasets. However, the learning dynamics of such networks is still poorly understood. In this paper we study in detail the training dynamics of a simple type of neural network: a single hidden layer trained to perform a classification task. We show that in a suitable mean-field limit this case maps to a single-node learning problem with a time-dependent dataset determined self-consistently from the average nodes population. We specialize our theory to the prototypical case of a linearly separable dataset and a linear hinge loss, for which the dynamics can be explicitly solved. This allow us to address in a simple setting several phenomena appearing in modern networks such as slowing down of training dynamics, crossover between rich and lazy learning, and overfitting. Finally, we asses the limitations of mean-field theory by studying the case of large but finite number of nodes and of training samples.

View on arXiv PDF Code

Similar