Gradient scarcity with Bilevel Optimization for Graph Learning
This addresses a fundamental bottleneck in semi-supervised graph learning for researchers and practitioners, though it is incremental as it builds on prior work on gradient scarcity.
The paper tackles gradient scarcity in graph learning, where edges far from labeled nodes receive zero gradients, and shows it occurs in bilevel optimization and Laplacian regularization, with gradients decreasing exponentially with distance. It proposes solutions like latent graph learning, graph regularization, and optimizing on larger graphs, validated by experiments on synthetic and real datasets.
A common issue in graph learning under the semi-supervised setting is referred to as gradient scarcity. That is, learning graphs by minimizing a loss on a subset of nodes causes edges between unlabelled nodes that are far from labelled ones to receive zero gradients. The phenomenon was first described when optimizing the graph and the weights of a Graph Neural Network (GCN) with a joint optimization algorithm. In this work, we give a precise mathematical characterization of this phenomenon, and prove that it also emerges in bilevel optimization, where additional dependency exists between the parameters of the problem. While for GCNs gradient scarcity occurs due to their finite receptive field, we show that it also occurs with the Laplacian regularization model, in the sense that gradients amplitude decreases exponentially with distance to labelled nodes. To alleviate this issue, we study several solutions: we propose to resort to latent graph learning using a Graph-to-Graph model (G2G), graph regularization to impose a prior structure on the graph, or optimizing on a larger graph than the original one with a reduced diameter. Our experiments on synthetic and real datasets validate our analysis and prove the efficiency of the proposed solutions.