LGAug 26, 2022

Toward Robust Graph Semi-Supervised Learning against Extreme Data Scarcity

arXiv:2208.12422v220 citationsh-index: 64
Originality Incremental advance
AI Analysis

This addresses the challenge of replicable and sustainable graph-based web mining when labeled data is scarce, though it is an incremental improvement over existing self-training methods.

The paper tackles the problem of robust graph semi-supervised learning with very few labeled nodes by proposing AGST, a data augmentation framework that improves node classification accuracy, achieving gains of up to 5-10% over baselines in low-data scenarios.

The success of graph neural networks on graph-based web mining highly relies on abundant human-annotated data, which is laborious to obtain in practice. When only few labeled nodes are available, how to improve their robustness is a key to achieve replicable and sustainable graph semi-supervised learning. Though self-training has been shown to be powerful for semi-supervised learning, its application on graph-structured data may fail because (1) larger receptive fields are not leveraged to capture long-range node interactions, which exacerbates the difficulty of propagating feature-label patterns from labeled nodes to unlabeled nodes; and (2) limited labeled data makes it challenging to learn well-separated decision boundaries for different node classes without explicitly capturing the underlying semantic structure. To address the challenges of capturing informative structural and semantic knowledge, we propose a new graph data augmentation framework, AGST (Augmented Graph Self-Training), which is built with two new (i.e., structural and semantic) augmentation modules on top of a decoupled GST backbone. In this work, we investigate whether this novel framework can learn a robust graph predictive model under the low-data context. We conduct comprehensive evaluations on semi-supervised node classification under different scenarios of limited labeled-node data. The experimental results demonstrate the unique contributions of the novel data augmentation framework for node classification with few labeled data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes