DBMay 11

Keeping track of errors: A study of SHACL-DS for RDF dataset validation on the ERA RINF Knowledge Graph

Davan Chiem Dao, Ghislain Atemezing, Christophe Debruyne

arXiv:2605.1054030.5

Predicted impact top 52% in DB · last 90 daysOriginality Synthesis-oriented

AI Analysis

For practitioners working with large RDF datasets that use named graphs, this work demonstrates the practical feasibility and performance benefits of SHACL-DS over the standard SHACL approach.

The paper applies SHACL-DS to validate a large-scale RDF Knowledge Graph (ERA RINF) and shows that SHACL-DS is faster than the SHACL baseline while providing additional features like per-graph validation and provenance tracking.

SHACL-DS extends SHACL for RDF dataset validation by introducing declarative targeting of named graphs and graph combinations, but has not yet been demonstrated and assessed on a real, large-scale Knowledge Graph (KG). In this paper, we apply the SHACL-DS approach to validate its use on such a KG. We apply SHACL-DS to the European Railway Infrastructure (ERA RINF) KG, a large-scale RDF dataset in which 56 infrastructure managers contribute data to dedicated named graphs. We migrate the ERA-RINF shapes to SHACL-DS using two strategies and evaluate their performance using a TopBraid SHACL-DS implementation developed for this study. We compare the performance against the SHACL approach, which "flattens" all graphs into a single data graph. Both strategies produce the same results and are faster than the SHACL baseline. Not only do we demonstrate that SHACL-DS is at least as expressive as SHACL, but SHACL-DS also allows the validation scope to be declared inside the shapes artefact, enforces triple provenance through \texttt{GRAPH} clauses, enriches validation reports with per-graph annotations, and enables shape organisation across named shapes graphs.

View on arXiv PDF

Similar