Paweł Misiorek

2papers

2 Papers

12.3DBMay 15
Relational Database Data Lineage Ontology

Jakub Dutkiewicz, Paweł Misiorek, Robert Wrembel

Modeling data lineage in relational databases remains a challenging problem, particularly in scenarios involving incomplete or missing dependencies between database objects. In this paper, we propose a novel ontology for relational database data lineage, designed to provide a richer and more expressive semantic representation supporting discovering the lineage links by means of knowledge graphs (KGs). Building upon our previous work on KG-based lineage discovery, the proposed ontology extends the earlier model with additional concepts capturing structural, semantic, and transformation-level characteristics of relational data. These extensions enable more precise encoding of lineage evidence. To evaluate the impact of the proposed ontology, we conduct a comparative study using a KG-based inductive link prediction framework. Specifically, we assess the performance of a graph neural network model based on path embeddings under two settings: using the original baseline ontology and the newly proposed one. Experimental results demonstrate that the application of the enriched semantic model leads to improvements in lineage link prediction performance, as measured by AUC and Hits@10 metrics.

SIJun 25, 2024
Modularity Based Community Detection in Hypergraphs

Bogumił Kamiński, Paweł Misiorek, Paweł Prałat et al.

In this paper, we propose a scalable community detection algorithm using hypergraph modularity function, h-Louvain. It is an adaptation of the classical Louvain algorithm in the context of hypergraphs. We observe that a direct application of the Louvain algorithm to optimize the hypergraph modularity function often fails to find meaningful communities. We propose a solution to this issue by adjusting the initial stage of the algorithm via carefully and dynamically tuned linear combination of the graph modularity function of the corresponding two-section graph and the desired hypergraph modularity function. The process is guided by Bayesian optimization of the hyper-parameters of the proposed procedure. Various experiments on synthetic as well as real-world networks are performed showing that this process yields improved results in various regimes.