Ontology-based knowledge graph infrastructure for interoperable atomistic simulation data
For researchers in materials science and atomistic simulation, this work provides a practical framework to improve data findability, interoperability, and reuse, addressing a key bottleneck in data-driven discovery.
The authors present an ontology-based infrastructure that integrates heterogeneous atomistic simulation data into a knowledge graph, enabling consistent querying and analysis across datasets. The resulting graph contains over 750,000 triples describing nearly 8,000 computational samples.
The reuse of atomistic simulation data is often limited by heterogeneous formats, incomplete metadata, and a lack of standardized representations of workflows and provenance. Here we present an ontology-based infrastructure for representing and integrating atomistic simulation data as a knowledge graph. The approach combines domain ontologies with a software framework that enables data capture both from existing datasets and directly from simulation workflows at the point of generation. Heterogeneous data from multiple sources are normalized into a common, ontology-aligned representation, enabling consistent querying and analysis across datasets. We demonstrate these capabilities through the integration of grain boundary data, cross-dataset analysis of material properties, and extraction of derived thermodynamic quantities from existing simulations. In addition, workflows are represented in a machine-readable form, enabling both forward provenance tracking and partial reconstruction of computational procedures. The resulting knowledge graph contains over 750,000 triples describing nearly 8,000 computational samples. This work provides a practical framework for improving the findability, interoperability, and reuse of atomistic simulation data.