34.2DCMay 10
ATLAS: Efficient Out-of-Core Inference for Billion-Scale Graph Neural NetworksPranjal Naman, Yogesh Simmhan
Graph Neural Network (GNN) inference on billion-scale graphs is critical for domains like fintech and recommendation systems. Full-graph inference on these large graphs can be challenging due to high communication costs in distributed settings and high I/O costs in disk-backed Out-of-Core (OOC) settings. Existing OOC systems, operating across disk and memory, primarily focus on GNN training and perform poorly for full-graph inference due to massive read amplification, irregular I/O, and memory pressure. We present ATLAS, a disk-based GNN inference framework that enables efficient full-graph, layer-wise inference on graphs whose topologies, features and intermediate embeddings exceed the available memory on single machines. ATLAS replaces gather-based execution with a broadcast-based model that enables sequential, single-pass streaming reads of features and embeddings per layer. A tiered memory-disk hierarchy with minimum-pending-message eviction, graph reordering and a GPU-accelerated pipeline sustains high throughput within $128$ GiB RAM and $2$ TiB SSD. Across out-of-core graphs with up to $4$B edges and $550$ GiB features and multiple GNN architectures, ATLAS improves end-to-end inference time by $\approx12$--$30\times$ over State-of-the-Art (SOTA) OOC baselines on a single workstation, while remaining within $\approx5\%$ when features fit in memory.
DCJun 14, 2025
Optimizing Federated Learning using Remote Embeddings for Graph Neural NetworksPranjal Naman, Yogesh Simmhan
Graph Neural Networks (GNNs) have experienced rapid advancements in recent years due to their ability to learn meaningful representations from graph data structures. Federated Learning (FL) has emerged as a viable machine learning approach for training a shared model on decentralized data, addressing privacy concerns while leveraging parallelism. Existing methods that address the unique requirements of federated GNN training using remote embeddings to enhance convergence accuracy are limited by their diminished performance due to large communication costs with a shared embedding server. In this paper, we present OpES, an optimized federated GNN training framework that uses remote neighbourhood pruning, and overlaps pushing of embeddings to the server with local training to reduce the network costs and training time. The modest drop in per-round accuracy due to pre-emptive push of embeddings is out-stripped by the reduction in per-round training time for large and dense graphs like Reddit and Products, converging up to $\approx2\times$ faster than the state-of-the-art technique using an embedding server and giving up to $20\%$ better accuracy than vanilla federated GNN learning.