LGCVMLJul 21, 2021

Memorization in Deep Neural Networks: Does the Loss Function matter?

arXiv:2107.09957v29 citations
Originality Incremental advance
AI Analysis

This addresses the problem of overfitting in deep learning for researchers and practitioners, offering a novel approach to mitigate memorization without relying on standard regularization techniques.

The paper investigates whether the choice of loss function affects the memorization of randomly labeled data in deep neural networks, finding that symmetric loss functions significantly improve resistance to overfitting compared to cross-entropy or squared error loss on MNIST and CIFAR-10 datasets.

Deep Neural Networks, often owing to the overparameterization, are shown to be capable of exactly memorizing even randomly labelled data. Empirical studies have also shown that none of the standard regularization techniques mitigate such overfitting. We investigate whether the choice of the loss function can affect this memorization. We empirically show, with benchmark data sets MNIST and CIFAR-10, that a symmetric loss function, as opposed to either cross-entropy or squared error loss, results in significant improvement in the ability of the network to resist such overfitting. We then provide a formal definition for robustness to memorization and provide a theoretical explanation as to why the symmetric losses provide this robustness. Our results clearly bring out the role loss functions alone can play in this phenomenon of memorization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes