ML LGJul 25, 2020

Posterior Consistency of Semi-Supervised Regression on Graphs

Andrea L. Bertozzi, Bamdad Hosseini, Hao Li, Kevin Miller, Andrew M. Stuart

arXiv:2007.12809v28.311 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the reliability of semi-supervised learning on graphs for applications like data classification, but it is incremental as it builds on existing Bayesian formulations and focuses on theoretical analysis without introducing new methods.

The paper tackles the problem of ensuring consistency in graph-based semi-supervised regression for classification, analyzing how the posterior measure contracts around the ground truth with bounds on rates and numerical validation.

Graph-based semi-supervised regression (SSR) is the problem of estimating the value of a function on a weighted graph from its values (labels) on a small subset of the vertices. This paper is concerned with the consistency of SSR in the context of classification, in the setting where the labels have small noise and the underlying graph weighting is consistent with well-clustered nodes. We present a Bayesian formulation of SSR in which the weighted graph defines a Gaussian prior, using a graph Laplacian, and the labeled data defines a likelihood. We analyze the rate of contraction of the posterior measure around the ground truth in terms of parameters that quantify the small label error and inherent clustering in the graph. We obtain bounds on the rates of contraction and illustrate their sharpness through numerical experiments. The analysis also gives insight into the choice of hyperparameters that enter the definition of the prior.

View on arXiv PDF

Similar