LG MLNov 9, 2021

On Representation Knowledge Distillation for Graph Neural Networks

Chaitanya K. Joshi, Fayao Liu, Xu Xun, Jie Lin, Chuan-Sheng Foo

arXiv:2111.04964v421.889 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of making resource-efficient GNNs more effective for real-world graph applications, representing an incremental improvement over prior distillation techniques.

The paper tackles the problem of improving knowledge distillation for graph neural networks (GNNs) by proposing Graph Contrastive Representation Distillation (G-CRD), which preserves global topology through contrastive learning, and it shows consistent performance boosts across 4 datasets and 14 architectures, outperforming existing methods like LSP.

Knowledge distillation is a learning paradigm for boosting resource-efficient graph neural networks (GNNs) using more expressive yet cumbersome teacher models. Past work on distillation for GNNs proposed the Local Structure Preserving loss (LSP), which matches local structural relationships defined over edges across the student and teacher's node embeddings. This paper studies whether preserving the global topology of how the teacher embeds graph data can be a more effective distillation objective for GNNs, as real-world graphs often contain latent interactions and noisy edges. We propose Graph Contrastive Representation Distillation (G-CRD), which uses contrastive learning to implicitly preserve global topology by aligning the student node embeddings to those of the teacher in a shared representation space. Additionally, we introduce an expanded set of benchmarks on large-scale real-world datasets where the performance gap between teacher and student GNNs is non-negligible. Experiments across 4 datasets and 14 heterogeneous GNN architectures show that G-CRD consistently boosts the performance and robustness of lightweight GNNs, outperforming LSP (and a global structure preserving variant of LSP) as well as baselines from 2D computer vision. An analysis of the representational similarity among teacher and student embedding spaces reveals that G-CRD balances preserving local and global relationships, while structure preserving approaches are best at preserving one or the other. Our code is available at https://github.com/chaitjo/efficient-gnns

View on arXiv PDF Code

Similar