LGAINov 7, 2023

Topology Only Pre-Training: Towards Generalised Multi-Domain Graph Models

arXiv:2311.03976v43 citationsh-index: 16
Originality Highly original
AI Analysis

This work addresses the challenge of transfer learning for graph models in domains with scarce data or labels, representing a novel paradigm rather than an incremental improvement.

The paper tackles the problem of domain-specific graph representation learning by introducing Topology Only Pre-Training (ToP), which excludes node and edge features during pre-training to enable transfer across multiple domains, including unseen ones, with ToP models performing significantly better than supervised baselines on 75% of experiments and showing positive performance on 85.7% of tasks when features are used in fine-tuning.

The principal benefit of unsupervised representation learning is that a pre-trained model can be fine-tuned where data or labels are scarce. Existing approaches for graph representation learning are domain specific, maintaining consistent node and edge features across the pre-training and target datasets. This has precluded transfer to multiple domains. We present Topology Only Pre-Training (ToP), a graph pre-training method based on node and edge feature exclusion. We show positive transfer on evaluation datasets from multiple domains, including domains not present in pre-training data, running directly contrary to assumptions made in contemporary works. On 75% of experiments, ToP models perform significantly $p \leq 0.01$ better than a supervised baseline. Performance is significantly positive on 85.7% of tasks when node and edge features are used in fine-tuning. We further show that out-of-domain topologies can produce more useful pre-training than in-domain. Under ToP we show better transfer from non-molecule pre-training, compared to molecule pre-training, on 79% of molecular benchmarks. Against the limited set of other generalist graph models ToP performs strongly, including against models with many orders of magnitude larger. These findings show that ToP opens broad areas of research in both transfer learning on scarcely populated graph domains and in graph foundation models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes