DBMar 11

Poisson Sampling over Acyclic Joins

arXiv:2603.10982v11.9h-index: 3
Predicted impact top 80% in DB · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses efficient sampling for database query engines, offering a unified approach for join processing and sampling, but it is incremental as it builds on existing methods for acyclic joins.

The paper tackles the problem of Poisson sampling over acyclic joins by proposing an algorithm that runs in time O(N + k log N), where N is the input database size and k is the sample size, and demonstrates through experiments on real-world data that it significantly outperforms the repeated-Bernoulli-trial algorithm.

We introduce the problem of Poisson sampling over joins: compute a sample of the result of a join query by conceptually performing a Bernoulli trial for each join tuple, using a non-uniform and tuple-specific probability. We propose an algorithm for Poisson sampling over acyclic joins that is nearly instance-optimal, running in time O(N + k \log N) where N is the size of the input database, and k is the size of the resulting sample. Our algorithm hinges on two building blocks: (1) The construction of a random-access index that allows, given a number i, to randomly access the i-th join tuple without fully materializing the (possibly large) join result; (2) The probing of this index to construct the result sample. We study the engineering trade-offs required to make both components practical, focusing on their implementation in column stores, and identify best-performing alternatives for both. Our experiments on real-world data demonstrate that this pair of alternatives significantly outperforms the repeated-Bernoulli-trial algorithm for Poisson sampling while also demonstrating that the random-access index by itself can be used to competively implement Yannakakis' acyclic join processing algorithm when no sampling is required. This shows that, as far a query engine design is concerned, it is possible to adopt a uniform basis for both classical acyclic join processing and Poisson sampling, both without regret compared to classical join and sampling algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes