LGAIOct 25, 2022

Online Cross-Layer Knowledge Distillation on Graph Neural Networks with Deep Supervision

arXiv:2210.13743v14 citationsh-index: 19
Originality Incremental advance
AI Analysis

This work addresses model compression for GNNs in edge computing, offering a novel online distillation approach that is incremental compared to existing offline methods.

The paper tackles the challenge of deploying graph neural networks (GNNs) on resource-constrained edge devices by proposing Alignahead++, an online knowledge distillation framework that eliminates the need for a pre-trained teacher model and uses deep supervision to prevent over-smoothing. Experimental results on four datasets (PPI, Cora, PubMed, and CiteSeer) show that student performance is consistently boosted, with effectiveness generally improving by increasing the number of students.

Graph neural networks (GNNs) have become one of the most popular research topics in both academia and industry communities for their strong ability in handling irregular graph data. However, large-scale datasets are posing great challenges for deploying GNNs in edge devices with limited resources and model compression techniques have drawn considerable research attention. Existing model compression techniques such as knowledge distillation (KD) mainly focus on convolutional neural networks (CNNs). Only limited attempts have been made recently for distilling knowledge from GNNs in an offline manner. As the performance of the teacher model does not necessarily improve as the number of layers increases in GNNs, selecting an appropriate teacher model will require substantial efforts. To address these challenges, we propose a novel online knowledge distillation framework called Alignahead++ in this paper. Alignahead++ transfers structure and feature information in a student layer to the previous layer of another simultaneously trained student model in an alternating training procedure. Meanwhile, to avoid over-smoothing problem in GNNs, deep supervision is employed in Alignahead++ by adding an auxiliary classifier in each intermediate layer to prevent the collapse of the node feature embeddings. Experimental results on four datasets including PPI, Cora, PubMed and CiteSeer demonstrate that the student performance is consistently boosted in our collaborative training framework without the supervision of a pre-trained teacher model and its effectiveness can generally be improved by increasing the number of students.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes