Provable Tempered Overfitting of Minimal Nets and Typical Nets
This provides theoretical insights into why deep neural networks can generalize well despite overfitting, addressing a fundamental problem in machine learning theory for researchers and practitioners.
The paper tackles the overfitting behavior of deep neural networks with binary weights when perfectly classifying noisy training data, proving that overfitting is tempered for both minimal and random interpolating networks. This result is the first theoretical demonstration of tempered overfitting in deep networks without extreme input dimension requirements.
We study the overfitting behavior of fully connected deep Neural Networks (NNs) with binary weights fitted to perfectly classify a noisy training set. We consider interpolation using both the smallest NN (having the minimal number of weights) and a random interpolating NN. For both learning rules, we prove overfitting is tempered. Our analysis rests on a new bound on the size of a threshold circuit consistent with a partial function. To the best of our knowledge, ours are the first theoretical results on benign or tempered overfitting that: (1) apply to deep NNs, and (2) do not require a very high or very low input dimension.