LG AI SIMay 22, 2025

Scalable Graph Generative Modeling via Substructure Sequences

Zehong Wang, Zheyuan Zhang, Tianyi Ma, Chuxu Zhang, Yanfang Ye

arXiv:2505.16130v219.712 citationsh-index: 24Has Code

Originality Highly original

AI Analysis

It addresses scalability issues in graph learning for researchers and practitioners, offering a novel approach beyond message-passing.

The paper tackles the scalability limitations of message-passing graph neural networks by introducing G^2PM, a generative Transformer framework that represents graphs as sequences of substructures, achieving improved performance on benchmarks like ogbn-arxiv with models up to 60M parameters.

Graph neural networks (GNNs) have been predominantly driven by message-passing, where node representations are iteratively updated via local neighborhood aggregation. Despite their success, message-passing suffers from fundamental limitations -- including constrained expressiveness, over-smoothing, over-squashing, and limited capacity to model long-range dependencies. These issues hinder scalability: increasing data size or model size often fails to yield improved performance. To this end, we explore pathways beyond message-passing and introduce Generative Graph Pattern Machine (G$^2$PM), a generative Transformer pre-training framework for graphs. G$^2$PM represents graph instances (nodes, edges, or entire graphs) as sequences of substructures, and employs generative pre-training over the sequences to learn generalizable and transferable representations. Empirically, G$^2$PM demonstrates strong scalability: on the ogbn-arxiv benchmark, it continues to improve with model sizes up to 60M parameters, outperforming prior generative approaches that plateau at significantly smaller scales (e.g., 3M). In addition, we systematically analyze the model design space, highlighting key architectural choices that contribute to its scalability and generalization. Across diverse tasks -- including node/link/graph classification, transfer learning, and cross-graph pretraining -- G$^2$PM consistently outperforms strong baselines, establishing a compelling foundation for scalable graph learning. The code and dataset are available at https://github.com/Zehong-Wang/G2PM.

View on arXiv PDF Code

Similar