LGMLOct 4, 2020

Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron

arXiv:2010.01637v31 citations
Originality Incremental advance
AI Analysis

This provides theoretical insight into a widely used but poorly understood phenomenon in deep learning, though it is incremental as it focuses on a simplified setting.

The paper tackles the problem of explaining why over-parametrization accelerates optimization in neural networks, by proving that in a simple case with a single teacher neuron, over-parametrization helps gradient descent reach a global optimum faster, achieving zero testing error.

Over-parametrization has become a popular technique in deep learning. It is observed that by over-parametrization, a larger neural network needs a fewer training iterations than a smaller one to achieve a certain level of performance -- namely, over-parametrization leads to acceleration in optimization. However, despite that over-parametrization is widely used nowadays, little theory is available to explain the acceleration due to over-parametrization. In this paper, we propose understanding it by studying a simple problem first. Specifically, we consider the setting that there is a single teacher neuron with quadratic activation, where over-parametrization is realized by having multiple student neurons learn the data generated from the teacher neuron. We provably show that over-parametrization helps the iterate generated by gradient descent to enter the neighborhood of a global optimal solution that achieves zero testing error faster. On the other hand, we also point out an issue regarding the necessity of over-parametrization and study how the scaling of the output neurons affects the convergence time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes