CV LGDec 7, 2020

Model Compression Using Optimal Transport

arXiv:2012.03907v15.810 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of deploying deep learning models in resource-constrained environments by improving model compression techniques, which is relevant for practitioners and researchers working on efficient AI.

This paper explores using optimal transport-based loss functions to train smaller student networks in knowledge distillation, aiming to align student and teacher feature distributions. The method achieves comparable or superior image classification results on CIFAR-100, SVHN, and ImageNet compared to other loss functions.

Model compression methods are important to allow for easier deployment of deep learning models in compute, memory and energy-constrained environments such as mobile phones. Knowledge distillation is a class of model compression algorithm where knowledge from a large teacher network is transferred to a smaller student network thereby improving the student's performance. In this paper, we show how optimal transport-based loss functions can be used for training a student network which encourages learning student network parameters that help bring the distribution of student features closer to that of the teacher features. We present image classification results on CIFAR-100, SVHN and ImageNet and show that the proposed optimal transport loss functions perform comparably to or better than other loss functions.

View on arXiv PDF

Similar