Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction
This addresses the bottleneck of workload-driven approaches that need extensive query executions for each new database, offering a practical solution for database management systems.
The paper tackles the problem of learned cost estimation for databases by introducing zero-shot cost models that generalize to unseen databases without requiring expensive training data collection, achieving more accurate cost estimates than state-of-the-art models for a wide range of real-world databases.
In this paper, we introduce zero-shot cost models which enable learned cost estimation that generalizes to unseen databases. In contrast to state-of-the-art workload-driven approaches which require to execute a large set of training queries on every new database, zero-shot cost models thus allow to instantiate a learned cost model out-of-the-box without expensive training data collection. To enable such zero-shot cost models, we suggest a new learning paradigm based on pre-trained cost models. As core contributions to support the transfer of such a pre-trained cost model to unseen databases, we introduce a new model architecture and representation technique for encoding query workloads as input to those models. As we will show in our evaluation, zero-shot cost estimation can provide more accurate cost estimates than state-of-the-art models for a wide range of (real-world) databases without requiring any query executions on unseen databases. Furthermore, we show that zero-shot cost models can be used in a few-shot mode that further improves their quality by retraining them just with a small number of additional training queries on the unseen database.