LGAISTMLDec 27, 2020

Mathematical Models of Overparameterized Neural Networks

arXiv:2012.13982v126 citations
AI Analysis

This review provides a better theoretical understanding of overparameterized neural networks, which is important for researchers and practitioners in deep learning.

This paper reviews recent theoretical advancements in understanding overparameterized neural networks, particularly focusing on two-layer networks. It highlights how these systems can behave like convex systems under specific conditions, such as within the neural tangent kernel space.

Deep learning has received considerable empirical successes in recent years. However, while many ad hoc tricks have been discovered by practitioners, until recently, there has been a lack of theoretical understanding for tricks invented in the deep learning literature. Known by practitioners that overparameterized neural networks are easy to learn, in the past few years there have been important theoretical developments in the analysis of overparameterized neural networks. In particular, it was shown that such systems behave like convex systems under various restricted settings, such as for two-layer NNs, and when learning is restricted locally in the so-called neural tangent kernel space around specialized initializations. This paper discusses some of these recent progresses leading to significant better understanding of neural networks. We will focus on the analysis of two-layer neural networks, and explain the key mathematical models, with their algorithmic implications. We will then discuss challenges in understanding deep neural networks and some current research directions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes