LGOct 26, 2023

A Spectral Condition for Feature Learning

MIT
arXiv:2310.17813v292 citationsh-index: 14
Originality Incremental advance
AI Analysis

This work addresses a foundational problem in deep learning for researchers and practitioners by providing a theoretical basis for feature learning, though it is incremental in refining scaling techniques.

The paper tackles the challenge of scaling neural network training to ensure feature learning across different widths by proposing a spectral norm scaling rule for weight matrices and updates, contrasting with existing heuristic methods.

The push to train ever larger neural networks has motivated the study of initialization and training at large network width. A key challenge is to scale training so that a network's internal representations evolve nontrivially at all widths, a process known as feature learning. Here, we show that feature learning is achieved by scaling the spectral norm of weight matrices and their updates like $\sqrt{\texttt{fan-out}/\texttt{fan-in}}$, in contrast to widely used but heuristic scalings based on Frobenius norm and entry size. Our spectral scaling analysis also leads to an elementary derivation of \emph{maximal update parametrization}. All in all, we aim to provide the reader with a solid conceptual understanding of feature learning in neural networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes