AI CLJul 3, 2024

GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models

arXiv:2407.02936v223.025 citationsh-index: 11Has Code

Originality Incremental advance

AI Analysis

This provides a comprehensive benchmark for researchers and practitioners to evaluate LLMs on graph tasks, though it is incremental as it builds on existing evaluation frameworks.

The authors tackled the challenge of evaluating graph comprehension and reasoning in Large Language Models by introducing GraCoRe, a benchmark that systematically assesses these abilities across 10 areas and 19 tasks using 5,140 graphs, finding that the OpenAI o1 model excels while semantic enrichment and node ordering affect performance.

Evaluating the graph comprehension and reasoning abilities of Large Language Models (LLMs) is challenging and often incomplete. Existing benchmarks focus primarily on pure graph understanding, lacking a comprehensive evaluation across all graph types and detailed capability definitions. This paper presents GraCoRe, a benchmark for systematically assessing LLMs' graph comprehension and reasoning. GraCoRe uses a three-tier hierarchical taxonomy to categorize and test models on pure graph and heterogeneous graphs, subdividing capabilities into 10 distinct areas tested through 19 tasks. Our benchmark includes 11 datasets with 5,140 graphs of varying complexity. We evaluate four closed-source and eight open-source LLMs, conducting thorough analyses from both ability and task perspectives. Key findings reveal that OpenAI o1 model has amazing comprehension and reasoning capabilities, semantic enrichment enhances reasoning performance, node ordering impacts task success, and the ability to process longer texts does not necessarily improve graph comprehension or reasoning.GraCoRe is open-sourced at https://github.com/ZIKEYUAN/GraCoRe

View on arXiv PDF Code

Similar