DBAIDCMar 28, 2022

WawPart: Workload-Aware Partitioning of Knowledge Graphs

arXiv:2203.14888v15 citationsh-index: 15
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in distributed graph processing for applications relying on large-scale knowledge graphs, representing an incremental improvement over existing partitioning techniques.

The paper tackles the performance degradation in distributed knowledge graph querying due to cross-partition joins by introducing a workload-aware partitioning method that reduces distributed joins and improves query processing time.

Large-scale datasets in the form of knowledge graphs are often used in numerous domains, today. A knowledge graphs size often exceeds the capacity of a single computer system, especially if the graph must be stored in main memory. To overcome this, knowledge graphs can be partitioned into multiple sub-graphs and distributed as shards among many computing nodes. However, performance of many common tasks performed on graphs, such as querying, suffers, as a result. This is due to distributed joins mandated by graph edges crossing (cutting) the partitions. In this paper, we propose a method of knowledge graph partitioning that takes into account a set of queries (workload). The resulting partitioning aims to reduces the number of distributed joins and improve the workload performance. Critical features identified in the query workload and the knowledge graph are used to cluster the queries and then partition the graph. Queries are rewritten to account for the graph partitioning. Our evaluation results demonstrate the performance improvement in workload processing time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes