Managing FAIR Knowledge Graphs as Polyglot Data End Points: A Benchmark based on the rdf2pg Framework and Plant Biology Data
This work addresses data management challenges for researchers and practitioners in domains like plant biology by enabling polyglot access to knowledge graphs, though it is incremental as it builds on existing standards and tools.
The paper tackles the integration of Linked Data and labelled property graphs by introducing the rdf2pg framework for mapping RDF data to LPG formats, and it benchmarks three graph databases (Virtuoso, Neo4j, ArcadeDB) and query languages (SPARQL, Cypher, Gremlin) using plant biology data, showing qualitative and quantitative assessments of their strengths and limitations.
Linked Data and labelled property graphs (LPG) are two data management approaches with complementary strengths and weaknesses, making their integration beneficial for sharing datasets and supporting software ecosystems. In this paper, we introduce rdf2pg, an extensible framework for mapping RDF data to semantically equivalent LPG formats and data-bases. Utilising this framework, we perform a comparative analysis of three popular graph databases - Virtuoso, Neo4j, and ArcadeDB - and the well-known graph query languages SPARQL, Cypher, and Gremlin. Our qualitative and quantitative as-sessments underline the strengths and limitations of these graph database technologies. Additionally, we highlight the potential of rdf2pg as a versatile tool for enabling polyglot access to knowledge graphs, aligning with established standards of Linked Data and the Semantic Web.