AILGFeb 14, 2025

GraphiT: Efficient Node Classification on Text-Attributed Graphs with Prompt Optimized LLMs

arXiv:2502.10522v16 citationsh-index: 7WWW
Originality Incremental advance
AI Analysis

This addresses the problem of inefficient and unreliable LLM-based node classification for researchers and practitioners working with text-attributed graphs, offering an incremental improvement through automated prompt optimization.

The paper tackles the challenge of efficiently encoding graph structure and features for LLMs on text-attributed graphs and the dependency on manual prompt adjustments, proposing GraphiT to encode graphs into text and optimize prompts programmatically. It outperforms LLM baselines on three datasets, with the optimization step improving results without manual tweaking and using fewer tokens for competitive performance.

The application of large language models (LLMs) to graph data has attracted a lot of attention recently. LLMs allow us to use deep contextual embeddings from pretrained models in text-attributed graphs, where shallow embeddings are often used for the text attributes of nodes. However, it is still challenging to efficiently encode the graph structure and features into a sequential form for use by LLMs. In addition, the performance of an LLM alone, is highly dependent on the structure of the input prompt, which limits their effectiveness as a reliable approach and often requires iterative manual adjustments that could be slow, tedious and difficult to replicate programmatically. In this paper, we propose GraphiT (Graphs in Text), a framework for encoding graphs into a textual format and optimizing LLM prompts for graph prediction tasks. Here we focus on node classification for text-attributed graphs. We encode the graph data for every node and its neighborhood into a concise text to enable LLMs to better utilize the information in the graph. We then further programmatically optimize the LLM prompts using the DSPy framework to automate this step and make it more efficient and reproducible. GraphiT outperforms our LLM-based baselines on three datasets and we show how the optimization step in GraphiT leads to measurably better results without manual prompt tweaking. We also demonstrated that our graph encoding approach is competitive to other graph encoding methods while being less expensive because it uses significantly less tokens for the same task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes