DB LGDec 29, 2020

BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation

Ziniu Wu, Amir Shaikhha, Rong Zhu, Kai Zeng, Yuxing Han, Jingren Zhou

arXiv:2012.14743v210.333 citations

Originality Highly original

AI Analysis

This work provides a more efficient and accurate cardinality estimation method for database management systems, which can significantly improve query optimization for database users and administrators.

This paper addresses the problem of cardinality estimation (CardEst) in database management systems, which is crucial for query optimizers. The authors propose BayesCard, a Bayesian network-based framework that achieves comparable or better accuracy than state-of-the-art methods, while being 1-2 orders of magnitude faster in inference, 1-3 orders faster in training, and 1-3 orders smaller in model size. When deployed in PostgreSQL, BayesCard improved end-to-end query time by 13.3% on the IMDB benchmark.

Cardinality estimation (CardEst) is an essential component in query optimizers and a fundamental problem in DBMS. A desired CardEst method should attain good algorithm performance, be stable to varied data settings, and be friendly to system deployment. However, no existing CardEst method can fulfill the three criteria at the same time. Traditional methods often have significant algorithm drawbacks such as large estimation errors. Recently proposed deep learning based methods largely improve the estimation accuracy but their performance can be greatly affected by data and often difficult for system deployment. In this paper, we revitalize the Bayesian networks (BN) for CardEst by incorporating the techniques of probabilistic programming languages. We present BayesCard, the first framework that inherits the advantages of BNs, i.e., high estimation accuracy and interpretability, while overcomes their drawbacks, i.e. low structure learning and inference efficiency. This makes BayesCard a perfect candidate for commercial DBMS deployment. Our experimental results on several single-table and multi-table benchmarks indicate BayesCard's superiority over existing state-of-the-art CardEst methods: BayesCard achieves comparable or better accuracy, 1-2 orders of magnitude faster inference time, 1-3 orders faster training time, 1-3 orders smaller model size, and 1-2 orders faster updates. Meanwhile, BayesCard keeps stable performance when varying data with different settings. We also deploy BayesCard into PostgreSQL. On the IMDB benchmark workload, it improves the end-to-end query time by 13.3%, which is very close to the optimal result of 14.2% using an oracle of true cardinality.

View on arXiv PDF

Similar