DBMay 15

Relational Database Data Lineage Ontology

arXiv:2605.1606816.9
Predicted impact top 70% in DB · last 90 daysOriginality Incremental advance
AI Analysis

For researchers and practitioners dealing with incomplete data lineage in relational databases, this work provides an incremental improvement in lineage discovery accuracy through ontology enrichment.

The authors propose a novel ontology for relational database data lineage that extends previous work with richer semantics, and show via a graph neural network link prediction framework that it improves lineage link prediction performance (AUC and Hits@10) over the baseline ontology.

Modeling data lineage in relational databases remains a challenging problem, particularly in scenarios involving incomplete or missing dependencies between database objects. In this paper, we propose a novel ontology for relational database data lineage, designed to provide a richer and more expressive semantic representation supporting discovering the lineage links by means of knowledge graphs (KGs). Building upon our previous work on KG-based lineage discovery, the proposed ontology extends the earlier model with additional concepts capturing structural, semantic, and transformation-level characteristics of relational data. These extensions enable more precise encoding of lineage evidence. To evaluate the impact of the proposed ontology, we conduct a comparative study using a KG-based inductive link prediction framework. Specifically, we assess the performance of a graph neural network model based on path embeddings under two settings: using the original baseline ontology and the newly proposed one. Experimental results demonstrate that the application of the enriched semantic model leads to improvements in lineage link prediction performance, as measured by AUC and Hits@10 metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes