LG AIJul 20, 2024

Teach Harder, Learn Poorer: Rethinking Hard Sample Distillation for GNN-to-MLP Knowledge Distillation

Lirong Wu, Yunfan Liu, Haitao Lin, Yufei Huang, Stan Z. Li

arXiv:2407.14768v19.210 citationsh-index: 26Has Code

Originality Incremental advance

AI Analysis

It addresses the problem of inefficient knowledge transfer in graph-based machine learning for practitioners needing lightweight models, but it is incremental as it builds on existing distillation methods.

The paper tackles the performance bottleneck in GNN-to-MLP knowledge distillation by identifying and decoupling two types of hardness in knowledge samples, proposing a non-parametric framework (HGMD) that improves MLP performance by 12.95% over vanilla MLPs and outperforms teacher GNNs by 2.48% on average across seven datasets.

To bridge the gaps between powerful Graph Neural Networks (GNNs) and lightweight Multi-Layer Perceptron (MLPs), GNN-to-MLP Knowledge Distillation (KD) proposes to distill knowledge from a well-trained teacher GNN into a student MLP. In this paper, we revisit the knowledge samples (nodes) in teacher GNNs from the perspective of hardness, and identify that hard sample distillation may be a major performance bottleneck of existing graph KD algorithms. The GNN-to-MLP KD involves two different types of hardness, one student-free knowledge hardness describing the inherent complexity of GNN knowledge, and the other student-dependent distillation hardness describing the difficulty of teacher-to-student distillation. However, most of the existing work focuses on only one of these aspects or regards them as one thing. This paper proposes a simple yet effective Hardness-aware GNN-to-MLP Distillation (HGMD) framework, which decouples the two hardnesses and estimates them using a non-parametric approach. Finally, two hardness-aware distillation schemes (i.e., HGMD-weight and HGMD-mixup) are further proposed to distill hardness-aware knowledge from teacher GNNs into the corresponding nodes of student MLPs. As non-parametric distillation, HGMD does not involve any additional learnable parameters beyond the student MLPs, but it still outperforms most of the state-of-the-art competitors. HGMD-mixup improves over the vanilla MLPs by 12.95% and outperforms its teacher GNNs by 2.48% averaged over seven real-world datasets.

View on arXiv PDF Code

Similar