CL AI LGDec 13, 2024

Can LLMs Convert Graphs to Text-Attributed Graphs?

Zehong Wang, Sidney Liu, Zheyuan Zhang, Tianyi Ma, Chuxu Zhang, Yanfang Ye

arXiv:2412.10136v215.429 citationsh-index: 24Has CodeNAACL

Originality Incremental advance

AI Analysis

This work solves the issue of limited text-attributed graph data availability for researchers and practitioners in fields like drug discovery and social network analysis, representing an incremental advancement by applying LLMs to graph preprocessing.

The paper tackles the problem of converting existing graphs into text-attributed graphs to address challenges in cross-graph learning, proposing TANS, which uses LLMs to integrate topological information into node descriptions, and demonstrates significant performance improvements on text-free graphs compared to manual feature design methods.

Graphs are ubiquitous structures found in numerous real-world applications, such as drug discovery, recommender systems, and social network analysis. To model graph-structured data, graph neural networks (GNNs) have become a popular tool. However, existing GNN architectures encounter challenges in cross-graph learning where multiple graphs have different feature spaces. To address this, recent approaches introduce text-attributed graphs (TAGs), where each node is associated with a textual description, which can be projected into a unified feature space using textual encoders. While promising, this method relies heavily on the availability of text-attributed graph data, which is difficult to obtain in practice. To bridge this gap, we propose a novel method named Topology-Aware Node description Synthesis (TANS), leveraging large language models (LLMs) to convert existing graphs into text-attributed graphs. The key idea is to integrate topological information into LLMs to explain how graph topology influences node semantics. We evaluate our TANS on text-rich, text-limited, and text-free graphs, demonstrating its applicability. Notably, on text-free graphs, our method significantly outperforms existing approaches that manually design node features, showcasing the potential of LLMs for preprocessing graph-structured data in the absence of textual information. The code and data are available at https://github.com/Zehong-Wang/TANS.

View on arXiv PDF Code

Similar