LGJun 15, 2024

Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights

Zhikai Chen, Haitao Mao, Jingzhe Liu, Yu Song, Bingheng Li, Wei Jin, Bahare Fatemi, Anton Tsitsulin, Bryan Perozzi, Hui Liu, Jiliang Tang

arXiv:2406.10727v126.849 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses a critical gap for researchers and practitioners in graph machine learning by providing standardized tools to evaluate text-space GFMs, though it is incremental as it focuses on benchmarking rather than introducing a new method.

The authors tackled the lack of comprehensive benchmarks and datasets for text-space Graph Foundation Models (GFMs), which aim to unify diverse graph data using natural language features, by creating a new benchmark with novel datasets and evaluations. Their empirical results provided new insights and inspired future research directions, with code and data made publicly available.

Given the ubiquity of graph data and its applications in diverse domains, building a Graph Foundation Model (GFM) that can work well across different graphs and tasks with a unified backbone has recently garnered significant interests. A major obstacle to achieving this goal stems from the fact that graphs from different domains often exhibit diverse node features. Inspired by multi-modal models that align different modalities with natural language, the text has recently been adopted to provide a unified feature space for diverse graphs. Despite the great potential of these text-space GFMs, current research in this field is hampered by two problems. First, the absence of a comprehensive benchmark with unified problem settings hinders a clear understanding of the comparative effectiveness and practical value of different text-space GFMs. Second, there is a lack of sufficient datasets to thoroughly explore the methods' full potential and verify their effectiveness across diverse settings. To address these issues, we conduct a comprehensive benchmark providing novel text-space datasets and comprehensive evaluation under unified problem settings. Empirical results provide new insights and inspire future research directions. Our code and data are publicly available from \url{https://github.com/CurryTang/TSGFM}.

View on arXiv PDF Code

Similar