Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication
This addresses the memory bottleneck and trainability issues in scaling GNNs for graph learning tasks, offering a data-centric solution that improves performance without architectural changes.
The paper tackles the problem of scaling Graph Neural Networks (GNNs) without deepening or widening, which often leads to issues like unhealthy gradients and over-smoothing, by proposing Graph Ladling, a method that divides graph data into subgraphs, trains multiple GNNs in parallel without intermediate communication, and merges them using a greedy interpolation soup procedure, achieving state-of-the-art performance across multiple small and large graphs.
Graphs are omnipresent and GNNs are a powerful family of neural networks for learning over graphs. Despite their popularity, scaling GNNs either by deepening or widening suffers from prevalent issues of unhealthy gradients, over-smoothening, information squashing, which often lead to sub-standard performance. In this work, we are interested in exploring a principled way to scale GNNs capacity without deepening or widening, which can improve its performance across multiple small and large graphs. Motivated by the recent intriguing phenomenon of model soups, which suggest that fine-tuned weights of multiple large-language pre-trained models can be merged to a better minima, we argue to exploit the fundamentals of model soups to mitigate the aforementioned issues of memory bottleneck and trainability during GNNs scaling. More specifically, we propose not to deepen or widen current GNNs, but instead present a data-centric perspective of model soups tailored for GNNs, i.e., to build powerful GNNs. By dividing giant graph data, we build multiple independently and parallelly trained weaker GNNs (soup ingredient) without any intermediate communication, and combine their strength using a greedy interpolation soup procedure to achieve state-of-the-art performance. Compared to concurrent distributed GNN training works such as Jiong et. al. 2023, we train each soup ingredient by sampling different subgraphs per epoch and their respective sub-models are merged only after being fully trained (rather than intermediately so). Moreover, we provide a wide variety of model soup preparation techniques by leveraging state-of-the-art graph sampling and graph partitioning approaches that can handle large graphs. Codes are available at: \url{https://github.com/VITA-Group/graph_ladling}.