CVLGDec 7, 2020

Model Compression Using Optimal Transport

arXiv:2012.03907v110 citations
AI Analysis

This work addresses the problem of deploying deep learning models in resource-constrained environments by improving model compression techniques, which is relevant for practitioners and researchers working on efficient AI.

This paper explores using optimal transport-based loss functions to train smaller student networks in knowledge distillation, aiming to align student and teacher feature distributions. The method achieves comparable or superior image classification results on CIFAR-100, SVHN, and ImageNet compared to other loss functions.

Model compression methods are important to allow for easier deployment of deep learning models in compute, memory and energy-constrained environments such as mobile phones. Knowledge distillation is a class of model compression algorithm where knowledge from a large teacher network is transferred to a smaller student network thereby improving the student's performance. In this paper, we show how optimal transport-based loss functions can be used for training a student network which encourages learning student network parameters that help bring the distribution of student features closer to that of the teacher features. We present image classification results on CIFAR-100, SVHN and ImageNet and show that the proposed optimal transport loss functions perform comparably to or better than other loss functions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes