DBMar 10

K-Join: Combining Vertex Covers for Parallel Joins

Simon Frisk, Austen Fan, Paraschos Koutris

arXiv:2603.10177v19.6h-index: 24

Predicted impact top 30% in DB · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the open question of optimal parallel join processing for database systems, offering incremental improvements over existing methods.

The paper tackles the problem of determining the best parallel algorithm for join queries in massively parallel computation by introducing a new algorithm that combines data partitioning and the HyperCube primitive with a novel choice of HyperCube shares based on linear combinations of vertex covers. The result is a load characterized as n/p^{1/κ}, where κ is a new hypergraph measure called reduced quasi vertex-cover, which matches or improves on all state-of-the-art algorithms.

Significant research effort has been devoted to improving the performance of join processing in the massively parallel computation model, where the goal is to evaluate a query with the minimum possible data transfer between machines. However, it is still an open question to determine the best possible parallel algorithm for any join query. In this paper, we present an algorithm that takes a step forward in this endeavour. Our new algorithm is simple and builds on two existing ideas: data partitioning and the HyperCube primitive. The novelty in our approach comes from a careful choice of the HyperCube shares, which is done as a linear combination of multiple vertex covers. The resulting load with input size $n$ and $p$ processors is characterized as $n/p^{1/κ}$, where $κ$ is a new hypergraph theoretic measure we call the reduced quasi vertex-cover. The new measure matches or improves on all state-of-the-art algorithms and exhibits strong similarities to the edge quasi-packing that describes the worst-case optimal load in one-round algorithms.

View on arXiv PDF

Similar