Distilling Knowledge from Graph Convolutional Networks
This work addresses a gap in knowledge distillation for non-grid data like graphs, which is incremental as it adapts existing distillation concepts to a new domain.
The paper tackles the problem of knowledge distillation for graph convolutional networks (GCNs), which had been overlooked compared to CNNs, by proposing a local structure preserving module that transfers topological semantics from teacher to student, resulting in a compact yet high-performance student model that achieves state-of-the-art distillation performance on two datasets.
Existing knowledge distillation methods focus on convolutional neural networks (CNNs), where the input samples like images lie in a grid domain, and have largely overlooked graph convolutional networks (GCN) that handle non-grid data. In this paper, we propose to our best knowledge the first dedicated approach to distilling knowledge from a pre-trained GCN model. To enable the knowledge transfer from the teacher GCN to the student, we propose a local structure preserving module that explicitly accounts for the topological semantics of the teacher. In this module, the local structure information from both the teacher and the student are extracted as distributions, and hence minimizing the distance between these distributions enables topology-aware knowledge transfer from the teacher, yielding a compact yet high-performance student model. Moreover, the proposed approach is readily extendable to dynamic graph models, where the input graphs for the teacher and the student may differ. We evaluate the proposed method on two different datasets using GCN models of different architectures, and demonstrate that our method achieves the state-of-the-art knowledge distillation performance for GCN models. Code is publicly available at https://github.com/ihollywhy/DistillGCN.PyTorch.