LGMLNov 8, 2019

Deep geometric knowledge distillation with graphs

arXiv:1911.03080v144 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient deep learning in embedded systems by improving knowledge distillation, though it appears incremental as it builds on existing RKD methods.

The paper tackles the problem of knowledge distillation for resource-constrained applications by proposing a graph-based relative knowledge distillation method that captures the geometry of latent spaces, allowing dimension-agnostic transfer and leading to better accuracy for the same budget compared to existing alternatives.

In most cases deep learning architectures are trained disregarding the amount of operations and energy consumption. However, some applications, like embedded systems, can be resource-constrained during inference. A popular approach to reduce the size of a deep learning architecture consists in distilling knowledge from a bigger network (teacher) to a smaller one (student). Directly training the student to mimic the teacher representation can be effective, but it requires that both share the same latent space dimensions. In this work, we focus instead on relative knowledge distillation (RKD), which considers the geometry of the respective latent spaces, allowing for dimension-agnostic transfer of knowledge. Specifically we introduce a graph-based RKD method, in which graphs are used to capture the geometry of latent spaces. Using classical computer vision benchmarks, we demonstrate the ability of the proposed method to efficiently distillate knowledge from the teacher to the student, leading to better accuracy for the same budget as compared to existing RKD alternatives.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes