GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability
This work addresses the challenge of applying LLMs to graph data, which is common in real-world domains, representing an incremental advancement in general AI capabilities.
The paper tackles the problem of enhancing large language models' (LLMs) ability to understand and reason with graph-structured data by introducing GraphInstruct, a dynamic benchmark with 21 graph reasoning tasks, and developing GraphSolver and GraphSolver+ models that show superior performance compared to other open-sourced LLMs.
Improving the general capabilities of large language models (LLMs) is an active research topic. As a common data structure in many real-world domains, understanding graph data is a crucial part of advancing general intelligence. To this end, we propose a dynamic benchmark named GraphInstruct in this paper, which comprehensively includes 21 classical graph reasoning tasks, providing diverse graph generation pipelines and detailed intermediate reasoning steps for each sample. Based on GraphInstruct, we develop GraphSolver via efficient instruction-tuning, which demonstrates prominent graph understanding capability compared to other open-sourced LLMs. To further endow LLMs with multi-step graph reasoning capability, we propose a label-mask training strategy and build GraphSolver+, which leverages masked supervision on intermediate reasoning tokens to emphasize crucial node-identification signals. As one of the pioneering efforts to enhance the graph understanding and reasoning abilities of LLMs, extensive experiments have demonstrated the superiority of GraphSolver and GraphSolver+ over other LLMs. We sincerely hope GraphInstruct will facilitate further research on applying LLMs to graph-structured data. Our code and data are released publicly at: https://github.com/CGCL-codes/GraphInstruct.