A Reinforcement Learning Based R-Tree for Spatial Data Indexing in Dynamic Environments
This work addresses a domain-specific problem for database systems by offering an incremental improvement over existing R-Tree variants without requiring structural changes.
The paper tackles the problem of improving query performance in spatial data indexing by using reinforcement learning to optimize R-Tree construction decisions, such as subtree selection and node splitting, resulting in faster query processing times on datasets with over 100 million objects.
Learned indices have been proposed to replace classic index structures like B-Tree with machine learning (ML) models. They require to replace both the indices and query processing algorithms currently deployed by the databases, and such a radical departure is likely to encounter challenges and obstacles. In contrast, we propose a fundamentally different way of using ML techniques to improve on the query performance of the classic R-Tree without the need of changing its structure or query processing algorithms. Specifically, we develop reinforcement learning (RL) based models to decide how to choose a subtree for insertion and how to split a node when building an R-Tree, instead of relying on hand-crafted heuristic rules currently used by R-Tree and its variants. Experiments on real and synthetic datasets with up to more than 100 million spatial objects clearly show that our RL based index outperforms R-Tree and its variants in terms of query processing time.