LGNov 17, 2022

Why Deep Learning Generalizes

arXiv:2211.09639v24 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the fundamental problem of understanding generalization in deep learning for researchers and practitioners, though it is incremental in exploring existing biases.

The paper investigates why deep learning models generalize rather than memorize, finding that memorization is difficult relative to generalization but becomes easier with added noise, and shows that generalization arises from parameters being attracted to points of maximal stability during gradient descent.

Very large deep learning models trained using gradient descent are remarkably resistant to memorization given their huge capacity, but are at the same time capable of fitting large datasets of pure noise. Here methods are introduced by which models may be trained to memorize datasets that normally are generalized. We find that memorization is difficult relative to generalization, but that adding noise makes memorization easier. Increasing the dataset size exaggerates the characteristics of that dataset: model access to more training samples makes overfitting easier for random data, but somewhat harder for natural images. The bias of deep learning towards generalization is explored theoretically, and we show that generalization results from a model's parameters being attracted to points of maximal stability with respect to that model's inputs during gradient descent.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes