On the Consistency of Graph-based Bayesian Learning and the Scalability of Sampling Algorithms
This provides theoretical guarantees for scalable semi-supervised learning methods, addressing a key bottleneck in large-scale applications.
The paper tackles the problem of ensuring consistency in graph-based Bayesian learning as unlabeled data grows, and proves that this leads to scalable MCMC algorithms with uniform spectral gaps, supported by numerical experiments.
A popular approach to semi-supervised learning proceeds by endowing the input data with a graph structure in order to extract geometric information and incorporate it into a Bayesian framework. We introduce new theory that gives appropriate scalings of graph parameters that provably lead to a well-defined limiting posterior as the size of the unlabeled data set grows. Furthermore, we show that these consistency results have profound algorithmic implications. When consistency holds, carefully designed graph-based Markov chain Monte Carlo algorithms are proved to have a uniform spectral gap, independent of the number of unlabeled inputs. Several numerical experiments corroborate both the statistical consistency and the algorithmic scalability established by the theory.