LGDec 2, 2021

Multi-task Self-distillation for Graph-based Semi-Supervised Learning

Yating Ren, Junzhong Ji, Lingfeng Niu, Minglong Lei

arXiv:2112.01174v311.39 citations

Originality Incremental advance

AI Analysis

This addresses a common issue in graph neural networks for real-world applications, but it is incremental as it builds on existing graph convolutional methods.

The paper tackles the problem of mismatches between graph structures and labels in graph-based semi-supervised learning, which can propagate misleading features and degrade model performance, and proposes a multi-task self-distillation framework that achieves remarkable performance gains under several classic graph convolutional architectures.

Graph convolutional networks have made great progress in graph-based semi-supervised learning. Existing methods mainly assume that nodes connected by graph edges are prone to have similar attributes and labels, so that the features smoothed by local graph structures can reveal the class similarities. However, there often exist mismatches between graph structures and labels in many real-world scenarios, where the structures may propagate misleading features or labels that eventually affect the model performance. In this paper, we propose a multi-task self-distillation framework that injects self-supervised learning and self-distillation into graph convolutional networks to separately address the mismatch problem from the structure side and the label side. First, we formulate a self-supervision pipeline based on pre-text tasks to capture different levels of similarities in graphs. The feature extraction process is encouraged to capture more complex proximity by jointly optimizing the pre-text task and the target task. Consequently, the local feature aggregations are improved from the structure side. Second, self-distillation uses soft labels of the model itself as additional supervision, which has similar effects as label smoothing. The knowledge from the classification pipeline and the self-supervision pipeline is collectively distilled to improve the generalization ability of the model from the label side. Experiment results show that the proposed method obtains remarkable performance gains under several classic graph convolutional architectures.

View on arXiv PDF

Similar