LGMLJun 24, 2020

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

arXiv:2006.13866v298 citations
AI Analysis

This addresses a bottleneck in scaling GNN training for large graphs, offering a method to improve efficiency and performance, though it is incremental as it builds on existing sampling techniques.

The paper tackles the problem of high variance in stochastic gradients for training Graph Neural Networks (GNNs) on large graphs, which causes slow convergence and poor generalization, by proposing a decoupled variance reduction strategy that uses gradient information for adaptive sampling and reduces embedding approximation variance, resulting in faster convergence and better generalization with smaller mini-batch sizes.

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes