DBLGOct 3, 2025

Is it Bigger than a Breadbox: Efficient Cardinality Estimation for Real World Workloads

arXiv:2510.03386v11 citationsh-index: 36
Originality Incremental advance
AI Analysis

This addresses the operational complexity and accuracy issues in cardinality estimation for database engines, offering a practical solution for real-world workloads.

The paper tackled the problem of cardinality estimation errors in database query execution by introducing a method that learns many simple regressors online, each localized to repetitive subquery patterns, which competes with state-of-the-art learning-based approaches in accuracy. It achieved a 7.5-minute speed-up (>30%) on the JOB-lite workload with only 37 seconds of overhead for online learning.

DB engines produce efficient query execution plans by relying on cost models. Practical implementations estimate cardinality of queries using heuristics, with magic numbers tuned to improve average performance on benchmarks. Empirically, estimation error significantly grows with query complexity. Alternatively, learning-based estimators offer improved accuracy, but add operational complexity preventing their adoption in-practice. Recognizing that query workloads contain highly repetitive subquery patterns, we learn many simple regressors online, each localized to a pattern. The regressor corresponding to a pattern can be randomly-accessed using hash of graph structure of the subquery. Our method has negligible overhead and competes with SoTA learning-based approaches on error metrics. Further, amending PostgreSQL with our method achieves notable accuracy and runtime improvements over traditional methods and drastically reduces operational costs compared to other learned cardinality estimators, thereby offering the most practical and efficient solution on the Pareto frontier. Concretely, simulating JOB-lite workload on IMDb speeds-up execution by 7.5 minutes (>30%) while incurring only 37 seconds overhead for online learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes