LGCVMLMay 20, 2019

Zero-Shot Knowledge Distillation in Deep Networks

arXiv:1905.08114v1263 citations
Originality Incremental advance
AI Analysis

This addresses privacy and scalability issues in knowledge distillation for applications like biometric or medical data, though it is incremental as it builds on existing distillation methods.

The paper tackles the problem of training a student model without access to the original training data due to privacy or size constraints, by synthesizing data impressions from the teacher model and achieving competitive generalization performance on multiple benchmark datasets.

Knowledge distillation deals with the problem of training a smaller model (Student) from a high capacity source model (Teacher) so as to retain most of its performance. Existing approaches use either the training data or meta-data extracted from it in order to train the Student. However, accessing the dataset on which the Teacher has been trained may not always be feasible if the dataset is very large or it poses privacy or safety concerns (e.g., bio-metric or medical data). Hence, in this paper, we propose a novel data-free method to train the Student from the Teacher. Without even using any meta-data, we synthesize the Data Impressions from the complex Teacher model and utilize these as surrogates for the original training data samples to transfer its learning to Student via knowledge distillation. We, therefore, dub our method "Zero-Shot Knowledge Distillation" and demonstrate that our framework results in competitive generalization performance as achieved by distillation using the actual training data samples on multiple benchmark datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes