LG MLOct 7, 2019

Effective Stabilized Self-Training on Few-Labeled Graph Data

Ziang Zhou, Jieming Shi, Shengzhong Zhang, Zengfeng Huang, Qing Li

arXiv:1910.02684v47.118 citations

Originality Incremental advance

AI Analysis

This addresses a critical bottleneck in semi-supervised node classification for graph data, particularly in low-label scenarios, though it is incremental as it builds upon existing GNN methods.

The paper tackles the problem of unstable training and performance degradation in graph neural networks (GNNs) when very few labeled nodes are available, proposing a Stabilized Self-Training (SST) framework that boosts accuracy, such as achieving 62.5% accuracy with SSTGCN on Cora with 1 labeled node per class, a 17.9% improvement over GCN.

Graph neural networks (GNNs) are designed for semi-supervised node classification on graphs where only a subset of nodes have class labels. However, under extreme cases when very few labels are available (e.g., 1 labeled node per class), GNNs suffer from severe performance degradation. Specifically, we observe that existing GNNs suffer from unstable training process on few-labeled graphs, resulting to inferior performance on node classification. Therefore, we propose an effective framework, Stabilized Self-Training (SST), which is applicable to existing GNNs to handle the scarcity of labeled data, and consequently, boost classification accuracy. We conduct thorough empirical and theoretical analysis to support our findings and motivate the algorithmic designs in SST. We apply SST to two popular GNN models GCN and DAGNN, to get SSTGCN and SSTDA methods respectively, and evaluate the two methods against 10 competitors over 5 benchmarking datasets. Extensive experiments show that the proposed SST framework is highly effective, especially when few labeled data are available. Our methods achieve superior performance under almost all settings over all datasets. For instance, on a Cora dataset with only 1 labeled node per class, the accuracy of SSTGCN is 62.5%, 17.9% higher than GCN, and the accuracy of SSTDA is 66.4%, which outperforms DAGNN by 6.6%.

View on arXiv PDF

Similar