LGJun 14, 2022

FreeKD: Free-direction Knowledge Distillation for Graph Neural Networks

Kaituo Feng, Changsheng Li, Ye Yuan, Guoren Wang

arXiv:2206.06561v417.751 citationsh-index: 40

Originality Highly original

AI Analysis

This addresses a practical bottleneck in applying knowledge distillation to GNNs for researchers and practitioners, offering a novel alternative to teacher-student setups.

The paper tackles the difficulty of training effective teacher graph neural networks (GNNs) for knowledge distillation due to over-parametrization and over-smoothing, proposing FreeKD, a reinforcement learning-based framework that enables two shallower GNNs to exchange knowledge dynamically, achieving performance comparable to or better than traditional methods with deeper teachers on five benchmark datasets.

Knowledge distillation (KD) has demonstrated its effectiveness to boost the performance of graph neural networks (GNNs), where its goal is to distill knowledge from a deeper teacher GNN into a shallower student GNN. However, it is actually difficult to train a satisfactory teacher GNN due to the well-known over-parametrized and over-smoothing issues, leading to invalid knowledge transfer in practical applications. In this paper, we propose the first Free-direction Knowledge Distillation framework via Reinforcement learning for GNNs, called FreeKD, which is no longer required to provide a deeper well-optimized teacher GNN. The core idea of our work is to collaboratively build two shallower GNNs in an effort to exchange knowledge between them via reinforcement learning in a hierarchical way. As we observe that one typical GNN model often has better and worse performances at different nodes during training, we devise a dynamic and free-direction knowledge transfer strategy that consists of two levels of actions: 1) node-level action determines the directions of knowledge transfer between the corresponding nodes of two networks; and then 2) structure-level action determines which of the local structures generated by the node-level actions to be propagated. In essence, our FreeKD is a general and principled framework which can be naturally compatible with GNNs of different architectures. Extensive experiments on five benchmark datasets demonstrate our FreeKD outperforms two base GNNs in a large margin, and shows its efficacy to various GNNs. More surprisingly, our FreeKD has comparable or even better performance than traditional KD algorithms that distill knowledge from a deeper and stronger teacher GNN.

View on arXiv PDF

Similar