CV AIJul 6, 2021

Embracing the Dark Knowledge: Domain Generalization Using Regularized Knowledge Distillation

Yufei Wang, Haoliang Li, Lap-pui Chau, Alex C. Kot

arXiv:2107.02629v117.554 citations

Originality Incremental advance

AI Analysis

This addresses the practical limitation of neural networks failing to generalize well with limited data, though it appears to be an incremental improvement over existing knowledge distillation approaches.

The paper tackles the problem of poor generalization in convolutional neural networks when training data is insufficient or unrepresentative by proposing KDDG, a knowledge distillation framework with a gradient filter regularization term. Experiments show the method significantly improves generalization across image classification, segmentation, and reinforcement learning tasks compared to state-of-the-art domain generalization techniques.

Though convolutional neural networks are widely used in different tasks, lack of generalization capability in the absence of sufficient and representative data is one of the challenges that hinder their practical application. In this paper, we propose a simple, effective, and plug-and-play training strategy named Knowledge Distillation for Domain Generalization (KDDG) which is built upon a knowledge distillation framework with the gradient filter as a novel regularization term. We find that both the ``richer dark knowledge" from the teacher network, as well as the gradient filter we proposed, can reduce the difficulty of learning the mapping which further improves the generalization ability of the model. We also conduct experiments extensively to show that our framework can significantly improve the generalization capability of deep neural networks in different tasks including image classification, segmentation, reinforcement learning by comparing our method with existing state-of-the-art domain generalization techniques. Last but not the least, we propose to adopt two metrics to analyze our proposed method in order to better understand how our proposed method benefits the generalization capability of deep neural networks.

View on arXiv PDF

Similar