Intraclass clustering: an implicit learning ability that regularizes DNNs
This work addresses a fundamental issue in machine learning by providing insights into generalization for researchers, though it is incremental as it builds on prior studies of regularization.
The paper tackles the problem of understanding regularization mechanisms in deep neural networks by hypothesizing that intraclass clustering acts as an implicit regularizer, and it shows that measures of this clustering predict generalization performance across various hyperparameter variations.
Several works have shown that the regularization mechanisms underlying deep neural networks' generalization performances are still poorly understood. In this paper, we hypothesize that deep neural networks are regularized through their ability to extract meaningful clusters among the samples of a class. This constitutes an implicit form of regularization, as no explicit training mechanisms or supervision target such behaviour. To support our hypothesis, we design four different measures of intraclass clustering, based on the neuron- and layer-level representations of the training data. We then show that these measures constitute accurate predictors of generalization performance across variations of a large set of hyperparameters (learning rate, batch size, optimizer, weight decay, dropout rate, data augmentation, network depth and width).