LGOCMar 31, 2023

On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks

arXiv:2303.17805v24 citationsh-index: 104
Originality Incremental advance
AI Analysis

This provides theoretical insights into neural network training dynamics, addressing a fundamental issue in machine learning, though it is incremental as it builds on existing regularization path concepts.

The paper tackles the effect of initialization scale on the training behavior of infinite-width 2-layer ReLU neural networks, showing that a convex formulation based on optimal transport theory allows the scaling path to interpolate between kernel and rich regimes, with numerical experiments confirming similarity to optimization paths.

In supervised learning, the regularization path is sometimes used as a convenient theoretical proxy for the optimization path of gradient descent initialized from zero. In this paper, we study a modification of the regularization path for infinite-width 2-layer ReLU neural networks with nonzero initial distribution of the weights at different scales. By exploiting a link with unbalanced optimal-transport theory, we show that, despite the non-convexity of the 2-layer network training, this problem admits an infinite-dimensional convex counterpart. We formulate the corresponding functional-optimization problem and investigate its main properties. In particular, we show that, as the scale of the initialization ranges between $0$ and $+\infty$, the associated path interpolates continuously between the so-called kernel and rich regimes. Numerical experiments confirm that, in our setting, the scaling path and the final states of the optimization path behave similarly, even beyond these extreme points.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes