DB LGAug 23, 2021

No DBA? No regret! Multi-armed bandits for index tuning of analytical and HTAP workloads with provable guarantees

R. Malinga Perera, Bastian Oetomo, Benjamin I. P. Rubinstein, Renata Borovica-Gajic

arXiv:2108.10130v14.311 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of dynamic and unpredictable workloads in modern database systems, particularly for analytical and HTAP applications, by providing a self-driving solution with provable guarantees, though it builds on existing bandit methods in a specific domain.

The paper tackles the problem of automating index selection in databases without requiring manual input from database administrators or relying on query optimizer estimates, by using a multi-armed bandit approach to learn optimal structures through exploration and direct performance observation. It achieves up to 75% speed-up on shifting workloads in analytical processing and up to 59% in HTAP environments, outperforming a commercial tool and deep reinforcement learning.

Automating physical database design has remained a long-term interest in database research due to substantial performance gains afforded by optimised structures. Despite significant progress, a majority of today's commercial solutions are highly manual, requiring offline invocation by database administrators (DBAs) who are expected to identify and supply representative training workloads. Even the latest advancements like query stores provide only limited support for dynamic environments. This status quo is untenable: identifying representative static workloads is no longer realistic; and physical design tools remain susceptible to the query optimiser's cost misestimates. Furthermore, modern application environments such as hybrid transactional and analytical processing (HTAP) systems render analytical modelling next to impossible. We propose a self-driving approach to online index selection that eschews the DBA and query optimiser, and instead learns the benefits of viable structures through strategic exploration and direct performance observation. We view the problem as one of sequential decision making under uncertainty, specifically within the bandit learning setting. Multi-armed bandits balance exploration and exploitation to provably guarantee average performance that converges to policies that are optimal with perfect hindsight. Our comprehensive empirical evaluation against a state-of-the-art commercial tuning tool demonstrates up to 75% speed-up on shifting and ad-hoc workloads and up to 28% speed-up on static workloads in analytical processing environments. In HTAP environments, our solution provides up to 59% speed-up on shifting and 51% speed-up on static workloads. Furthermore, our bandit framework outperforms deep reinforcement learning (RL) in terms of convergence speed and performance volatility (providing up to 58% speed-up).

View on arXiv PDF

Similar