LGMar 10, 2022

Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models

arXiv:2203.05104v17 citationsh-index: 55
Originality Incremental advance
AI Analysis

This provides a theoretical explanation for a foundational phenomenon in deep learning, addressing why wide networks behave linearly, which is incremental but clarifies existing findings.

The paper tackles the counter-intuitive linearity of wide neural networks by proposing that it emerges from assembling many diverse weak sub-models, showing that this property arises when no single sub-model dominates the assembly.

Wide neural networks with linear output layer have been shown to be near-linear, and to have near-constant neural tangent kernel (NTK), in a region containing the optimization path of gradient descent. These findings seem counter-intuitive since in general neural networks are highly complex models. Why does a linear structure emerge when the networks become wide? In this work, we provide a new perspective on this "transition to linearity" by considering a neural network as an assembly model recursively built from a set of sub-models corresponding to individual neurons. In this view, we show that the linearity of wide neural networks is, in fact, an emerging property of assembling a large number of diverse "weak" sub-models, none of which dominate the assembly.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes