LGCVMLMar 31, 2020

Regularizing Class-wise Predictions via Self-knowledge Distillation

arXiv:2003.13964v2339 citations
AI Analysis

This addresses overfitting and overconfidence in neural networks for image classification, though it is incremental as it builds on existing knowledge distillation techniques.

The paper tackles overfitting in deep neural networks by proposing a self-knowledge distillation method that regularizes predictive distributions between similar samples of the same label, resulting in improved generalization and calibration performance on image classification tasks.

Deep neural networks with millions of parameters may suffer from poor generalization due to overfitting. To mitigate the issue, we propose a new regularization method that penalizes the predictive distribution between similar samples. In particular, we distill the predictive distribution between different samples of the same label during training. This results in regularizing the dark knowledge (i.e., the knowledge on wrong predictions) of a single network (i.e., a self-knowledge distillation) by forcing it to produce more meaningful and consistent predictions in a class-wise manner. Consequently, it mitigates overconfident predictions and reduces intra-class variations. Our experimental results on various image classification tasks demonstrate that the simple yet powerful method can significantly improve not only the generalization ability but also the calibration performance of modern convolutional neural networks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes