Detecting Memorization in ReLU Networks
This addresses the issue of overfitting and memorization in neural networks for researchers and practitioners, though it is incremental as it builds on existing concepts of memorization detection.
The paper tackled the problem of detecting memorization in neural networks by proposing a new measure of non-linearity based on the non-negative rank of activation matrices, and found that high non-linearity in deep layers indicates memorization, enabling early stopping.
We propose a new notion of `non-linearity' of a network layer with respect to an input batch that is based on its proximity to a linear system, which is reflected in the non-negative rank of the activation matrix. We measure this non-linearity by applying non-negative factorization to the activation matrix. Considering batches of similar samples, we find that high non-linearity in deep layers is indicative of memorization. Furthermore, by applying our approach layer-by-layer, we find that the mechanism for memorization consists of distinct phases. We perform experiments on fully-connected and convolutional neural networks trained on several image and audio datasets. Our results demonstrate that as an indicator for memorization, our technique can be used to perform early stopping.