Assessing SPARQL capabilities of Large Language Models
This work addresses the integration of LLMs with knowledge graphs for semantic web applications, but it is incremental as it focuses on benchmarking existing models without introducing new methods.
The paper tackled the problem of assessing the out-of-the-box capabilities of large language models (LLMs) in working with SPARQL SELECT queries for knowledge graphs, finding that while fixing basic syntax errors is manageable for top models, creating semantically correct queries remains challenging and varies by model and task complexity.
The integration of Large Language Models (LLMs) with Knowledge Graphs (KGs) offers significant synergistic potential for knowledge-driven applications. One possible integration is the interpretation and generation of formal languages, such as those used in the Semantic Web, with SPARQL being a core technology for accessing KGs. In this paper, we focus on measuring out-of-the box capabilities of LLMs to work with SPARQL and more specifically with SPARQL SELECT queries applying a quantitative approach. We implemented various benchmarking tasks in the LLM-KG-Bench framework for automated execution and evaluation with several LLMs. The tasks assess capabilities along the dimensions of syntax, semantic read, semantic create, and the role of knowledge graph prompt inclusion. With this new benchmarking tasks, we evaluated a selection of GPT, Gemini, and Claude models. Our findings indicate that working with SPARQL SELECT queries is still challenging for LLMs and heavily depends on the specific LLM as well as the complexity of the task. While fixing basic syntax errors seems to pose no problems for the best of the current LLMs evaluated, creating semantically correct SPARQL SELECT queries is difficult in several cases.