CL AIJun 26, 2025

Towards Text-free Graph Foundation Models: Rethinking Multi-Domain Graph Contrastive Learning

Zihao Zhao, Xinlong Zhai, Jinyu Yang, Chuan Shi

arXiv:2506.22510v14 citationsh-index: 2

Originality Highly original

AI Analysis

This work addresses the problem of ineffective multi-domain knowledge integration in graph learning for applications like social networks and recommendation systems, representing an incremental advance with a novel method for a known bottleneck.

The paper tackles the challenge of building graph foundation models for text-free graphs by addressing the semantic gaps between different domains, proposing a multi-domain contrastive learning framework that improves accuracy by up to 19.33% and Macro-F1 by 19.13% over state-of-the-art methods.

Foundation models have achieved great success in natural language processing (NLP) and computer vision (CV). Their success largely stems from the ability to integrate multi-domain knowledge in pre-training and transfer it to target domains. Considering graph data, especially graphs without textual features, is ubiquitous in real-world applications such as social networks and recommendation systems, some researchers have attempted to extend this paradigm to the graph field, aiming to construct graph foundation models. However, unlike CV and NLP, there are huge gaps among the semantics and properties of graphs in different domains, while current works still adopt traditional contrastive pre-training strategies designed in the single-domain scenario, which regard contrastive samples from different domains as equivalent. From experimental investigations, we discovered that inherent domain-specific differences prevent these strategies from effectively absorbing knowledge from different domains to generate informative representations. In this paper, we propose a novel multi-domain pre-training and cross-domain transfer framework, namely MDGCL.In the pre-training stage, we design a contrastive learning strategy to substantially recognize and capture domain differences, and introduce domain tokens to encode domain-level global information. In the downstream stage, we introduce a domain attention mechanism to enable fine-grained domain knowledge transfer. Extensive experiments on five benchmark datasets have demonstrated that our method outperforms state-of-the-art significantly, with the maximum improvement of 19.33\% on accuracy and 19.13\% on Macro-F1 score.

View on arXiv PDF

Similar