Consistent Semi-Supervised Graph Regularization for High Dimensional Data
This addresses a key limitation in graph-based semi-supervised learning for high-dimensional datasets, offering a solution to improve learning efficiency.
The paper tackles the inconsistency problem in semi-supervised Laplacian regularization for high-dimensional data, where it becomes inefficient with unlabeled data and is outperformed by spectral clustering, and proposes a novel regularization with centering that is supported by theory and experiments.
Semi-supervised Laplacian regularization, a standard graph-based approach for learning from both labelled and unlabelled data, was recently demonstrated to have an insignificant high dimensional learning efficiency with respect to unlabelled data (Mai and Couillet 2018), causing it to be outperformed by its unsupervised counterpart, spectral clustering, given sufficient unlabelled data. Following a detailed discussion on the origin of this inconsistency problem, a novel regularization approach involving centering operation is proposed as solution, supported by both theoretical analysis and empirical results.