Arcangelo Massari

DL
3papers
14citations
Novelty37%
AI Score45

3 Papers

32.2DBApr 26
Time travel for knowledge graphs: live queries over RDF change histories

Arcangelo Massari, Silvio Peroni

Performing time-traversal queries on RDF datasets remains unsupported in the most extensive knowledge graphs. Existing solutions either require offline ingestion, which prevents concurrent querying and updating, or operate live but with limited query coverage or triplestore dependency. This article presents the Time Agnostic Library, a Python library for performing temporal SPARQL queries live on any SPARQL-compliant triplestore, supporting all six temporal retrieval needs identified in the literature and concurrent updates. The methodology builds on the OpenCitations Data Model (OCDM), which records provenance using the Provenance Ontology (PROV-O) and SPARQL UPDATE operations. The library supports version materialization, single-version and cross-version structured queries, delta materialization, and single-delta and cross-delta structured queries over multi-triple patterns. Evaluation on the BEAR-B benchmark shows sub-linear scaling in both execution time and memory consumption as the number of versions increases. While preprocessing-based systems such as OSTRICH achieve faster query times, they require offline ingestion and cannot handle concurrent data updates. Against R43ples, the closest live system in architecture, the Time Agnostic Library is faster across all query types.

61.7DLApr 23
OpenCitations Meta

Arcangelo Massari, Fabio Mariani, Ivan Heibi et al.

OpenCitations Meta is a new database for open bibliographic metadata of scholarly publications involved in the citations indexed by the OpenCitations infrastructure, adhering to Open Science principles and published under a CC0 license to promote maximum reuse. It presently incorporates bibliographic metadata for publications recorded in Crossref, DataCite and PubMed, making it the largest bibliographic metadata source using Semantic Web technologies. It assigns new globally persistent identifiers (PIDs), known as OpenCitations Meta Identifiers (OMIDs) to all bibliographic resources, enabling it both to disambiguate publications described using different external PIDS (e.g., a DOI in Crossref and a PMID in PubMed), and to handle citations involving publications lacking external PIDs. By hosting bibliographic metadata internally, OpenCitations Meta eliminates its former reliance on API calls to external resources and thus enhances performance in response to user queries. Its automated data curation, following the OpenCitations Data Model, includes deduplication, error correction, metadata enrichment and full provenance tracking, ensuring transparency and traceability of data and bolstering confidence in data integrity, a feature unparalleled in other bibliographic databases. Its commitment to Semantic Web standards ensures superior interoperability compared to other machine-readable formats, with availability via a SPARQL endpoint, REST APIs and data dumps.

97.3DLMay 3Code
HERITRACE: a domain-agnostic framework for SHACL-driven RDF curation with provenance and change tracking

Arcangelo Massari, Silvio Peroni

HERITRACE is an open-source web application that enables users without Semantic Web expertise to curate RDF data through form-based interfaces with automatic provenance documentation and change tracking in RDF. It uses SHACL for data model definition and form generation, connects to existing SPARQL-accessible stores without data migration, and records every modification as a provenance snapshot that can be browsed and restored. HERITRACE is domain-agnostic: adapting it to a new collection requires only SHACL shapes and YAML display rules, without code changes. This paper describes the software architecture and provides the first empirical evaluation. HERITRACE is deployed in production for the ParaText project, where classical philologists curate bibliographic data about ancient Greek exegetical traditions, and is planned as the editing interface for OpenCitations and as the curation layer for the Social Sciences and Humanities Citation Index within the GRAPHIA Horizon Europe project. Since it operates on any SPARQL-accessible store without data migration, its adoption potential extends to any domain maintaining RDF data. HERITRACE is publicly available on GitHub under the ISC license, archived on Zenodo and Software Heritage Archive, and documented for deployment with a pre-built Docker image.