GPT-GNN: Generative Pre-Training of Graph Neural Networks
This addresses the challenge of expensive labeled data acquisition for GNNs in domains such as academic networks and e-commerce, offering a method to reduce labeling effort, though it is incremental as it builds on existing pre-training and GNN concepts.
The paper tackles the problem of training graph neural networks (GNNs) with limited labeled data by introducing GPT-GNN, a framework for generative pre-training on unlabeled graphs, which achieves up to 9.1% improvement over state-of-the-art GNNs without pre-training on tasks like academic graph analysis and recommendation.
Graph neural networks (GNNs) have been demonstrated to be powerful in modeling graph-structured data. However, training GNNs usually requires abundant task-specific labeled data, which is often arduously expensive to obtain. One effective way to reduce the labeling effort is to pre-train an expressive GNN model on unlabeled data with self-supervision and then transfer the learned model to downstream tasks with only a few labels. In this paper, we present the GPT-GNN framework to initialize GNNs by generative pre-training. GPT-GNN introduces a self-supervised attributed graph generation task to pre-train a GNN so that it can capture the structural and semantic properties of the graph. We factorize the likelihood of the graph generation into two components: 1) Attribute Generation and 2) Edge Generation. By modeling both components, GPT-GNN captures the inherent dependency between node attributes and graph structure during the generative process. Comprehensive experiments on the billion-scale Open Academic Graph and Amazon recommendation data demonstrate that GPT-GNN significantly outperforms state-of-the-art GNN models without pre-training by up to 9.1% across various downstream tasks.