KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs
This work addresses the need for better evaluation methods in the field of knowledge graph integration with LLMs, but it is incremental as it builds on existing textualization approaches.
The authors tackled the problem of evaluating how different textualization strategies affect large language model (LLM) performance on knowledge graph reasoning tasks, and introduced KG-LLM-Bench, a scalable benchmark that spans five tasks and provides insights through experiments with seven models and five strategies.
Knowledge graphs have emerged as a popular method for injecting up-to-date, factual knowledge into large language models (LLMs). This is typically achieved by converting the knowledge graph into text that the LLM can process in context. While multiple methods of encoding knowledge graphs have been proposed, the impact of this textualization process on LLM performance remains under-explored. We introduce KG-LLM-Bench, a comprehensive and extensible benchmark spanning five knowledge graph understanding tasks, and evaluate how different encoding strategies affect performance across various base models. Our extensive experiments with seven language models and five textualization strategies provide insights for optimizing LLM performance on KG reasoning tasks.