LGJun 17, 2022

How You Start Matters for Generalization

Sameera Ramasinghe, Lachlan MacDonald, Moshiur Farazi, Hemanth Saratchandran, Simon Lucey

arXiv:2206.08558v27.88 citationsh-index: 57

Originality Highly original

AI Analysis

This addresses the open problem of neural network generalization for researchers, offering a new theoretical framework that challenges existing conjectures like flat minima.

The paper tackles the problem of understanding generalization in over-parameterized neural networks by shifting focus to initialization rather than architecture or optimization, showing through Fourier analysis that generalization is heavily tied to initialization and empirically validating this with deep networks.

Characterizing the remarkable generalization properties of over-parameterized neural networks remains an open problem. In this paper, we promote a shift of focus towards initialization rather than neural architecture or (stochastic) gradient descent to explain this implicit regularization. Through a Fourier lens, we derive a general result for the spectral bias of neural networks and show that the generalization of neural networks is heavily tied to their initialization. Further, we empirically solidify the developed theoretical insights using practical, deep networks. Finally, we make a case against the controversial flat-minima conjecture and show that Fourier analysis grants a more reliable framework for understanding the generalization of neural networks.

View on arXiv PDF

Similar