LGNov 7, 2022

Do highly over-parameterized neural networks generalize since bad solutions are rare?

arXiv:2211.03570v42 citationsh-index: 36
Originality Incremental advance
AI Analysis

This provides a theoretical explanation for the generalization of over-parameterized networks, addressing a key puzzle in deep learning, though it is incremental as it builds on existing ERM and generalization theory.

The paper tackles the problem of why highly over-parameterized neural networks generalize well despite having many global minima with zero training error, showing that under certain conditions, the fraction of 'bad' minima with high true error decays exponentially with the number of training data, as validated by experiments on synthetic data, MNIST, and Caltech101 subsets.

We study over-parameterized classifiers where Empirical Risk Minimization (ERM) for learning leads to zero training error. In these over-parameterized settings there are many global minima with zero training error, some of which generalize better than others. We show that under certain conditions the fraction of "bad" global minima with a true error larger than ε decays to zero exponentially fast with the number of training data n. The bound depends on the distribution of the true error over the set of classifier functions used for the given classification problem, and does not necessarily depend on the size or complexity (e.g. the number of parameters) of the classifier function set. This insight may provide a novel perspective on the unexpectedly good generalization even of highly over-parameterized neural networks. We substantiate our theoretical findings through experiments on synthetic data and a subset of MNIST. Additionally, we assess our hypothesis using VGG19 and ResNet18 on a subset of Caltech101.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes