DB DS LGDec 3, 2019

Learning Multi-dimensional Indexes

Vikram Nathan, Jialin Ding, Mohammad Alizadeh, Tim Kraska

arXiv:1912.01668v1251 citations

Originality Highly original

AI Analysis

This addresses performance tuning challenges for database engineers handling multi-dimensional data, though it is incremental as it builds on existing indexing concepts.

The paper tackles the problem of inconsistent performance in multi-dimensional indexes for analytical databases by introducing Flood, an adaptive in-memory index that jointly optimizes structure and storage, achieving up to 1000x faster range scans than state-of-the-art methods on real-world datasets.

Scanning and filtering over multi-dimensional tables are key operations in modern analytical database engines. To optimize the performance of these operations, databases often create clustered indexes over a single dimension or multi-dimensional indexes such as R-trees, or use complex sort orders (e.g., Z-ordering). However, these schemes are often hard to tune and their performance is inconsistent across different datasets and queries. In this paper, we introduce Flood, a multi-dimensional in-memory index that automatically adapts itself to a particular dataset and workload by jointly optimizing the index structure and data storage. Flood achieves up to three orders of magnitude faster performance for range scans with predicates than state-of-the-art multi-dimensional indexes or sort orders on real-world datasets and workloads. Our work serves as a building block towards an end-to-end learned database system.

View on arXiv PDF

Similar