LGOCPRMLOct 28, 2022

A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer Neural Networks

arXiv:2210.16286v15 citationsh-index: 68
Originality Incremental advance
AI Analysis

This work provides incremental theoretical insights into neural network training dynamics, specifically for researchers in machine learning theory focusing on deep learning and mean-field approximations.

The authors tackled the theoretical understanding of training dynamics in three-layer neural networks with a fixed first layer by extending mean-field theory to functional spaces, proving linear convergence of training loss to zero in L2 regression and establishing Rademacher complexity bounds for the solution spaces.

To understand the training dynamics of neural networks (NNs), prior studies have considered the infinite-width mean-field (MF) limit of two-layer NN, establishing theoretical guarantees of its convergence under gradient flow training as well as its approximation and generalization capabilities. In this work, we study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed. To define the limiting model rigorously, we generalize the MF theory of two-layer NNs by treating the neurons as belonging to functional spaces. Then, by writing the MF training dynamics as a kernel gradient flow with a time-varying kernel that remains positive-definite, we prove that its training loss in $L_2$ regression decays to zero at a linear rate. Furthermore, we define function spaces that include the solutions obtainable through the MF training dynamics and prove Rademacher complexity bounds for these spaces. Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors while both exhibiting feature learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes