xpSHACL: Explainable SHACL Validation using Retrieval-Augmented Generation and Large Language Models
This addresses the issue for non-technical users needing to validate linked data in Knowledge Graphs, but it is incremental as it builds on existing SHACL validation with enhanced explainability.
The paper tackles the problem of SHACL validation reports being difficult for non-technical users to interpret by introducing xpSHACL, a system that combines rule-based justification trees with retrieval-augmented generation and large language models to produce detailed, multilanguage, human-readable explanations for constraint violations, with a key feature being a Violation KG to cache and reuse explanations for improved efficiency and consistency.
Shapes Constraint Language (SHACL) is a powerful language for validating RDF data. Given the recent industry attention to Knowledge Graphs (KGs), more users need to validate linked data properly. However, traditional SHACL validation engines often provide terse reports in English that are difficult for non-technical users to interpret and act upon. This paper presents xpSHACL, an explainable SHACL validation system that addresses this issue by combining rule-based justification trees with retrieval-augmented generation (RAG) and large language models (LLMs) to produce detailed, multilanguage, human-readable explanations for constraint violations. A key feature of xpSHACL is its usage of a Violation KG to cache and reuse explanations, improving efficiency and consistency.