Fantastic Generalization Measures and Where to Find Them
This work addresses the problem of unreliable generalization measures for deep learning researchers by providing a comprehensive empirical analysis.
The authors conducted a large-scale study to evaluate over 40 complexity measures for generalization in deep networks, training over 10,000 convolutional networks with varied hyperparameters, and identified failures and promising measures for further research.
Generalization of deep networks has been of great interest in recent years, resulting in a number of theoretically and empirically motivated complexity measures. However, most papers proposing such measures study only a small set of models, leaving open the question of whether the conclusion drawn from those experiments would remain valid in other settings. We present the first large scale study of generalization in deep networks. We investigate more then 40 complexity measures taken from both theoretical bounds and empirical studies. We train over 10,000 convolutional networks by systematically varying commonly used hyperparameters. Hoping to uncover potentially causal relationships between each measure and generalization, we analyze carefully controlled experiments and show surprising failures of some measures as well as promising measures for further research.