LGAIJan 13, 2024

Three Mechanisms of Feature Learning in a Linear Network

MIT
arXiv:2401.07085v34 citationsh-index: 18ICLR
Originality Incremental advance
AI Analysis

This work offers incremental insights into neural network training dynamics, potentially guiding the design of more effective learning strategies for researchers and practitioners.

The authors tackled the problem of understanding neural network training dynamics by providing an exact solution for a one-hidden-layer linear network, identifying three novel feature learning mechanisms that also appear in deep nonlinear networks on real-world tasks.

Understanding the dynamics of neural networks in different width regimes is crucial for improving their training and performance. We present an exact solution for the learning dynamics of a one-hidden-layer linear network, with one-dimensional data, across any finite width, uniquely exhibiting both kernel and feature learning phases. This study marks a technical advancement by enabling the analysis of the training trajectory from any initialization and a detailed phase diagram under varying common hyperparameters such as width, layer-wise learning rates, and scales of output and initialization. We identify three novel prototype mechanisms specific to the feature learning regime: (1) learning by alignment, (2) learning by disalignment, and (3) learning by rescaling, which contrast starkly with the dynamics observed in the kernel regime. Our theoretical findings are substantiated with empirical evidence showing that these mechanisms also manifest in deep nonlinear networks handling real-world tasks, enhancing our understanding of neural network training dynamics and guiding the design of more effective learning strategies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes