Optimizing Diffusion Rate and Label Reliability in a Graph-Based Semi-supervised Classifier
This work addresses incremental improvements in semi-supervised learning for researchers, focusing on enhancing the robustness and parameter tuning of a specific graph-based classifier.
The paper tackles the problem of overfitting and parameter selection in the Local and Global Consistency (LGC) algorithm for graph-based semi-supervised learning by proposing to remove self-influence of labeled instances and optimize diffusion rate and label reliability, resulting in reduced overfitting and competitive performance with robust L1-norm methods.
Semi-supervised learning has received attention from researchers, as it allows one to exploit the structure of unlabeled data to achieve competitive classification results with much fewer labels than supervised approaches. The Local and Global Consistency (LGC) algorithm is one of the most well-known graph-based semi-supervised (GSSL) classifiers. Notably, its solution can be written as a linear combination of the known labels. The coefficients of this linear combination depend on a parameter $α$, determining the decay of the reward over time when reaching labeled vertices in a random walk. In this work, we discuss how removing the self-influence of a labeled instance may be beneficial, and how it relates to leave-one-out error. Moreover, we propose to minimize this leave-one-out loss with automatic differentiation. Within this framework, we propose methods to estimate label reliability and diffusion rate. Optimizing the diffusion rate is more efficiently accomplished with a spectral representation. Results show that the label reliability approach competes with robust L1-norm methods and that removing diagonal entries reduces the risk of overfitting and leads to suitable criteria for parameter selection.