A Consistent Diffusion-Based Algorithm for Semi-Supervised Graph Learning
This addresses a theoretical flaw in a widely used method for graph-based semi-supervised classification, offering improved accuracy for practitioners in network analysis.
The paper tackled the inconsistency of a popular heat diffusion algorithm for semi-supervised graph learning by proving that centering node temperatures before scoring makes it provably consistent on a block model, leading to significant performance gains on real graphs.
The task of semi-supervised classification aims at assigning labels to all nodes of a graph based on the labels known for a few nodes, called the seeds. One of the most popular algorithms relies on the principle of heat diffusion, where the labels of the seeds are spread by thermoconductance and the temperature of each node at equilibrium is used as a score function for each label. In this paper, we prove that this algorithm is not consistent unless the temperatures of the nodes at equilibrium are centered before scoring. This crucial step does not only make the algorithm provably consistent on a block model but brings significant performance gains on real graphs.