Maxime Jakubowski

DB
7papers
17citations
Novelty40%
AI Score48

7 Papers

68.0DBJun 2
A Community Survey on SHACL and ShEx: Briding Gaps in RDF Validation

Maxime Jakubowski, Dominik Tomaszuk, Katja Hose

This paper examines RDF validation practices and challenges to understand stakeholder applications, their needs, and identify areas for improvement in technologies and methodologies, thereby guiding future research and standardization efforts. A community survey was conducted, targeting a diverse group of RDF validation technology users across academia and industry. The survey collected data on current practices, tool usage, perceived benefits, limitations, and desired enhancements to gain a broad overview of the validation landscape. Our analysis shows that while RDF validation is widely adopted and valued for enhancing data quality, significant challenges remain. In particular, users report a need for better documentation, improved tool support, enhanced performance, and greater language expressiveness to handle complex large-scale validation tasks effectively. This work provides crucial insights into the RDF validation landscape, highlighting current practices and key areas for development. It offers a foundation for researchers, developers, and standardization bodies to address current limitations and advance validation technologies, ultimately improving data quality and usability in knowledge graphs.

42.3DBJun 4
Validation of graph databases against PG-Schema

Jacek Ciszewski, Jakub Kłos, Maxime Jakubowski et al.

The problem of validating a given graph database instance against a given PG-Schema graph type without integrity constraints is NP- complete in terms of combined complexity and in PTIME in terms of data complexity. The combined complexity drops to PTIME when the alternation between type combinations and unions is suitably restricted

48.5LOApr 22
Common Foundations for Recursive Shape Languages

Shqiponja Ahmetaj, Iovka Boneva, Jan Hidders et al.

As schema languages for RDF data become more mature, we are seeing efforts to extend them with recursive semantics, applying diverse ideas from logic programming and description logics. While ShEx has an official recursive semantics based on greatest fixpoints (GFP), the discussion for SHACL is ongoing and seems to be converging towards least fixpoints (LFP). A practical study we perform shows that, indeed, ShEx validators implement GFP, whereas SHACL validators are more heterogeneous. This situation creates tension between ShEx and SHACL, as their semantic commitments appear to diverge, potentially undermining interoperability and predictability. We aim to clarify this design space by comparing the main semantic options in a principled yet accessible way, hoping to engage both theoreticians and practitioners, especially those involved in developing tools and standards. We present a unifying formal semantics that treats LFP, GFP, and supported model semantics (SMS), clarifying their relationships and highlighting a duality between LFP and GFP on stratified fragments. Next, we investigate to which extent the directions taken by SHACL and ShEx are compatible. We show that, although ShEx and SHACL seem to be going in different directions, they include large fragments with identical expressive power. Moreover, there is a strong correspondence between these fragments through the aforementioned principle of duality. Finally, we present a complete picture of the data and combined complexity of ShEx and SHACL validation under LFP, GFP, and SMS, showing that SMS comes at a higher computational cost under standard complexity-theoretic assumptions.

66.5DBMay 6
A Graph-Native Approach to Normalization

Johannes Schrott, Maxime Jakubowski, Katja Hose

In recent years, knowledge graphs (KGs) - in particular in the form of labeled property graphs (LPGs) - have become essential components in a broad range of applications. Although the absence of strict schemas for KGs facilitates structural issues that lead to redundancies and subsequently to inconsistencies and anomalies, the problem of KG quality has so far received only little attention. Inspired by normalization using functional dependencies for relational data, a first approach exploiting dependencies within nodes has been proposed. However, real-world KGs also expose functional dependencies involving edges. In this paper, we therefore propose graph-native normalization, which considers dependencies within nodes, edges, and their combination. We define a range of graph-native normal forms and graph object functional dependencies and propose algorithms for transforming graphs accordingly. We evaluate our contributions using a broad range of synthetic and native graph datasets.

SEMay 26, 2025Code
RDFGraphGen: An RDF Graph Generator based on SHACL Shapes

Milos Jovanovik, Marija Vecovska, Maxime Jakubowski et al.

Developing and testing modern RDF-based applications often requires access to RDF datasets with certain characteristics. Unfortunately, it is very difficult to publicly find domain-specific knowledge graphs that conform to a particular set of characteristics. Hence, in this paper we propose RDFGraphGen, an open-source RDF graph generator that uses characteristics provided in the form of SHACL (Shapes Constraint Language) shapes to generate synthetic RDF graphs. RDFGraphGen is domain-agnostic, with configurable graph structure, value constraints, and distributions. It also comes with a number of predefined values for popular schema.org classes and properties, for more realistic graphs. Our results show that RDFGraphGen is scalable and can generate small, medium, and large RDF graphs in any domain.

DBDec 22, 2021
Shape Fragments

Thomas Delva, Anastasia Dimou, Maxime Jakubowski et al.

In constraint languages for RDF graphs, such as ShEx and SHACL, constraints on nodes and their properties in RDF graphs are known as "shapes". Schemas in these languages list the various shapes that certain targeted nodes must satisfy for the graph to conform to the schema. Using SHACL, we propose in this paper a novel use of shapes, by which a set of shapes is used to extract a subgraph from an RDF graph, the so-called shape fragment. Our proposed mechanism fits in the framework of Linked Data Fragments. In this paper, (i) we define our extraction mechanism formally, building on recently proposed SHACL formalizations; (ii) we establish correctness properties, which relate shape fragments to notions of provenance for database queries; (iii) we compare shape fragments with SPARQL queries; (iv) we discuss implementation options; and (v) we present initial experiments demonstrating that shape fragments are a feasible new idea.

DBSep 17, 2021
Fixpoint Semantics for Recursive SHACL

Bart Bogaerts, Maxime Jakubowski

SHACL is a W3C-proposed language for expressing structural constraints on RDF graphs. The recommendation only specifies semantics for non-recursive SHACL; recently, some efforts have been made to allow recursive SHACL schemas. In this paper, we argue that for defining and studying semantics of recursive SHACL, lessons can be learned from years of research in non-monotonic reasoning. We show that from a SHACL schema, a three-valued semantic operator can directly be obtained. Building on Approximation Fixpoint Theory (AFT), this operator immediately induces a wide variety of semantics, including a supported, stable, and well-founded semantics, related in the expected ways. By building on AFT, a rich body of theoretical results becomes directly available for SHACL. As such, the main contribution of this short paper is providing theoretical foundations for the study of recursive SHACL, which can later enable an informed decision for an extension of the W3C recommendation.