Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval
This addresses a critical issue for knowledge graph applications where factual errors in SPARQL queries can undermine real-world information retrieval.
The paper tackles the problem of hallucinations in language model-based SPARQL query generation by introducing PGMR, a modular framework that uses post-generation memory retrieval to enhance accuracy. The results show that PGMR significantly reduces URI hallucinations, nearly eliminating them in some scenarios.
The ability to generate SPARQL queries from natural language questions is crucial for ensuring efficient and accurate retrieval of structured data from knowledge graphs (KG). While large language models (LLMs) have been widely adopted for SPARQL query generation, they are often susceptible to hallucinations and out-of-distribution errors when producing KG elements like Uniform Resource Identifiers (URIs) based on internal parametric knowledge. This often results in content that appears plausible but is factually incorrect, posing significant challenges for their use in real-world information retrieval (IR) applications. This has led to increased research aimed at detecting and mitigating such errors. In this paper, we introduce PGMR (Post-Generation Memory Retrieval), a modular framework that incorporates a non-parametric memory module to retrieve KG elements and enhance LLM-based SPARQL query generation. Our experimental results indicate that PGMR consistently delivers strong performance across diverse datasets, data distributions, and LLMs. Notably, PGMR significantly mitigates URI hallucinations, nearly eliminating the problem in several scenarios.