CVLGNov 20, 2019

Search to Distill: Pearls are Everywhere but not the Eyes

arXiv:1911.09074v273 citations
Originality Incremental advance
AI Analysis

This work addresses the limitation of standard knowledge distillation methods by enabling more efficient and effective model compression for deployment in resource-constrained environments, representing a significant but incremental advance in the field.

The paper tackles the problem of knowledge distillation by proposing an architecture-aware approach that jointly optimizes both the parameters and architecture of the student model, achieving state-of-the-art results on ImageNet classification with various latency settings and demonstrating transferability to other tasks like face recognition and ensemble learning.

Standard Knowledge Distillation (KD) approaches distill the knowledge of a cumbersome teacher model into the parameters of a student model with a pre-defined architecture. However, the knowledge of a neural network, which is represented by the network's output distribution conditioned on its input, depends not only on its parameters but also on its architecture. Hence, a more generalized approach for KD is to distill the teacher's knowledge into both the parameters and architecture of the student. To achieve this, we present a new Architecture-aware Knowledge Distillation (AKD) approach that finds student models (pearls for the teacher) that are best for distilling the given teacher model. In particular, we leverage Neural Architecture Search (NAS), equipped with our KD-guided reward, to search for the best student architectures for a given teacher. Experimental results show our proposed AKD consistently outperforms the conventional NAS plus KD approach, and achieves state-of-the-art results on the ImageNet classification task under various latency settings. Furthermore, the best AKD student architecture for the ImageNet classification task also transfers well to other tasks such as million level face recognition and ensemble learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes