LGAPMLOct 25, 2019

Over Parameterized Two-level Neural Networks Can Learn Near Optimal Feature Representations

arXiv:1910.11508v119 citations
Originality Highly original
AI Analysis

This provides a theoretical explanation for the empirical success of over-parameterized neural networks, addressing a foundational gap in understanding for the machine learning community.

The paper tackles the problem of explaining why fully trained over-parameterized neural networks succeed in practice by introducing a theoretical framework called neural feature repopulation. It shows that in the limit of infinite hidden neurons, such networks trained via noisy gradient descent learn a near-optimal feature distribution for the task, with empirical studies confirming consistency with real-world results.

Recently, over-parameterized neural networks have been extensively analyzed in the literature. However, the previous studies cannot satisfactorily explain why fully trained neural networks are successful in practice. In this paper, we present a new theoretical framework for analyzing over-parameterized neural networks which we call neural feature repopulation. Our analysis can satisfactorily explain the empirical success of two level neural networks that are trained by standard learning algorithms. Our key theoretical result is that in the limit of infinite number of hidden neurons, over-parameterized two-level neural networks trained via the standard (noisy) gradient descent learns a well-defined feature distribution (population), and the limiting feature distribution is nearly optimal for the underlying learning task under certain conditions. Empirical studies confirm that predictions of our theory are consistent with the results observed in real practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes