DBAIPLJul 12, 2021

ProGS: Property Graph Shapes Language (Extended Version)

arXiv:2107.05566v11 citations
Originality Incremental advance
AI Analysis

This addresses a gap in ensuring data quality for knowledge graph applications, though it is incremental as it adapts existing shape language ideas from RDF to property graphs.

The paper tackles the problem of domain constraint violations in property graphs, which are used for knowledge graphs like Wikidata, by introducing ProGS, a shape language that allows for validating property graphs against constraints, with results including a formal semantics, complexity analysis, and a prototypical validator implementation.

Property graphs constitute data models for representing knowledge graphs. They allow for the convenient representation of facts, including facts about facts, represented by triples in subject or object position of other triples. Knowledge graphs such as Wikidata are created by a diversity of contributors and a range of sources leaving them prone to two types of errors. The first type of error, falsity of facts, is addressed by property graphs through the representation of provenance and validity, making triples occur as first-order objects in subject position of metadata triples. The second type of error, violation of domain constraints, has not been addressed with regard to property graphs so far. In RDF representations, this error can be addressed by shape languages such as SHACL or ShEx, which allow for checking whether graphs are valid with respect to a set of domain constraints. Borrowing ideas from the syntax and semantics definitions of SHACL, we design a shape language for property graphs, ProGS, which allows for formulating shape constraints on property graphs including their specific constructs, such as edges with identities and key-value annotations to both nodes and edges. We define a formal semantics of ProGS, investigate the resulting complexity of validating property graphs against sets of ProGS shapes, compare with corresponding results for SHACL, and implement a prototypical validator that utilizes answer set programming.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes