LGAug 26, 2022

Toward Robust Graph Semi-Supervised Learning against Extreme Data Scarcity

Kaize Ding, Elnaz Nouri, Guoqing Zheng, Huan Liu, Ryen White

arXiv:2208.12422v211.820 citationsh-index: 64

Originality Incremental advance

AI Analysis

This addresses the challenge of replicable and sustainable graph-based web mining when labeled data is scarce, though it is an incremental improvement over existing self-training methods.

The paper tackles the problem of robust graph semi-supervised learning with very few labeled nodes by proposing AGST, a data augmentation framework that improves node classification accuracy, achieving gains of up to 5-10% over baselines in low-data scenarios.

The success of graph neural networks on graph-based web mining highly relies on abundant human-annotated data, which is laborious to obtain in practice. When only few labeled nodes are available, how to improve their robustness is a key to achieve replicable and sustainable graph semi-supervised learning. Though self-training has been shown to be powerful for semi-supervised learning, its application on graph-structured data may fail because (1) larger receptive fields are not leveraged to capture long-range node interactions, which exacerbates the difficulty of propagating feature-label patterns from labeled nodes to unlabeled nodes; and (2) limited labeled data makes it challenging to learn well-separated decision boundaries for different node classes without explicitly capturing the underlying semantic structure. To address the challenges of capturing informative structural and semantic knowledge, we propose a new graph data augmentation framework, AGST (Augmented Graph Self-Training), which is built with two new (i.e., structural and semantic) augmentation modules on top of a decoupled GST backbone. In this work, we investigate whether this novel framework can learn a robust graph predictive model under the low-data context. We conduct comprehensive evaluations on semi-supervised node classification under different scenarios of limited labeled-node data. The experimental results demonstrate the unique contributions of the novel data augmentation framework for node classification with few labeled data.

View on arXiv PDF

Similar