LGNEMLMay 28, 2019

SGD on Neural Networks Learns Functions of Increasing Complexity

arXiv:1905.11604v1295 citations
Originality Incremental advance
AI Analysis

This provides insights into the learning dynamics of SGD for researchers in machine learning, though it is incremental as it builds on existing understanding of optimization and generalization.

The study investigates how Stochastic Gradient Descent (SGD) learns deep neural networks, finding that initial performance improvements are largely explained by a linear classifier and that SGD learns functions of increasing complexity over iterations, which may explain generalization in over-parameterized regimes.

We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks. We show that in the initial epochs, almost all of the performance improvement of the classifier obtained by SGD can be explained by a linear classifier. More generally, we give evidence for the hypothesis that, as iterations progress, SGD learns functions of increasing complexity. This hypothesis can be helpful in explaining why SGD-learned classifiers tend to generalize well even in the over-parameterized regime. We also show that the linear classifier learned in the initial stages is "retained" throughout the execution even if training is continued to the point of zero training error, and complement this with a theoretical result in a simplified model. Key to our work is a new measure of how well one classifier explains the performance of another, based on conditional mutual information.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes