Towards Foundation Models on Graphs: An Analysis on Cross-Dataset Transfer of Pretrained GNNs
This work addresses the challenge of creating versatile graph models for machine learning applications, but it is incremental as it builds on existing structural pretraining methods.
The paper tackles the problem of developing foundation models for graphs by analyzing how pretrained Graph Neural Networks transfer across datasets, finding that embeddings improve generalization with sufficient downstream data and depend on pretraining data properties, and feature information helps but requires similarities between feature spaces.
To develop a preliminary understanding towards Graph Foundation Models, we study the extent to which pretrained Graph Neural Networks can be applied across datasets, an effort requiring to be agnostic to dataset-specific features and their encodings. We build upon a purely structural pretraining approach and propose an extension to capture feature information while still being feature-agnostic. We evaluate pretrained models on downstream tasks for varying amounts of training samples and choices of pretraining datasets. Our preliminary results indicate that embeddings from pretrained models improve generalization only with enough downstream data points and in a degree which depends on the quantity and properties of pretraining data. Feature information can lead to improvements, but currently requires some similarities between pretraining and downstream feature spaces.