Keeping track of errors: A study of SHACL-DS for RDF dataset validation on the ERA RINF Knowledge Graph
For practitioners working with large RDF datasets that use named graphs, this work demonstrates the practical feasibility and performance benefits of SHACL-DS over the standard SHACL approach.
The paper applies SHACL-DS to validate a large-scale RDF Knowledge Graph (ERA RINF) and shows that SHACL-DS is faster than the SHACL baseline while providing additional features like per-graph validation and provenance tracking.
SHACL-DS extends SHACL for RDF dataset validation by introducing declarative targeting of named graphs and graph combinations, but has not yet been demonstrated and assessed on a real, large-scale Knowledge Graph (KG). In this paper, we apply the SHACL-DS approach to validate its use on such a KG. We apply SHACL-DS to the European Railway Infrastructure (ERA RINF) KG, a large-scale RDF dataset in which 56 infrastructure managers contribute data to dedicated named graphs. We migrate the ERA-RINF shapes to SHACL-DS using two strategies and evaluate their performance using a TopBraid SHACL-DS implementation developed for this study. We compare the performance against the SHACL approach, which "flattens" all graphs into a single data graph. Both strategies produce the same results and are faster than the SHACL baseline. Not only do we demonstrate that SHACL-DS is at least as expressive as SHACL, but SHACL-DS also allows the validation scope to be declared inside the shapes artefact, enforces triple provenance through \texttt{GRAPH} clauses, enriches validation reports with per-graph annotations, and enables shape organisation across named shapes graphs.