LGSTMLFeb 15, 2022

Random Feature Amplification: Feature Learning and Generalization in Neural Networks

arXiv:2202.07626v437 citations
Originality Highly original
AI Analysis

This provides theoretical insights into how neural networks learn features from random initialization, addressing a foundational problem in machine learning theory.

The paper tackles the problem of understanding feature learning in two-layer ReLU networks trained by gradient descent on XOR-like data with adversarial label noise, showing that these networks achieve generalization error close to the noise rate despite linear classifiers performing poorly.

In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although linear classifiers are no better than random guessing for the distribution we consider, two-layer ReLU networks trained by gradient descent achieve generalization error close to the label noise rate. We develop a novel proof technique that shows that at initialization, the vast majority of neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics 'amplify' these weak, random features to strong, useful features.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes