Graph-Based Uncertainty-Aware Self-Training with Stochastic Node Labeling
This addresses a key challenge in semi-supervised learning for graph data, offering improved accuracy in low-label scenarios.
The paper tackles the problem of over-confidence in pseudo-labels for semi-supervised node classification by proposing a graph-based uncertainty-aware self-training framework with stochastic node labeling, achieving state-of-the-art performance on benchmark datasets, particularly with extremely sparse labeled data.
Self-training has become a popular semi-supervised learning technique for leveraging unlabeled data. However, the over-confidence of pseudo-labels remains a key challenge. In this paper, we propose a novel \emph{graph-based uncertainty-aware self-training} (GUST) framework to combat over-confidence in node classification. Drawing inspiration from the uncertainty integration idea introduced by Wang \emph{et al.}~\cite{wang2024uncertainty}, our method largely diverges from previous self-training approaches by focusing on \emph{stochastic node labeling} grounded in the graph topology. Specifically, we deploy a Bayesian-inspired module to estimate node-level uncertainty, incorporate these estimates into the pseudo-label generation process via an expectation-maximization (EM)-like step, and iteratively update both node embeddings and adjacency-based transformations. Experimental results on several benchmark graph datasets demonstrate that our GUST framework achieves state-of-the-art performance, especially in settings where labeled data is extremely sparse.