DBDSLGJun 30, 2022

Proteus: A Self-Designing Range Filter

arXiv:2207.01503v136 citationsh-index: 78
Originality Highly original
AI Analysis

This addresses the need for robust and efficient range filters in database systems, offering a novel self-designing approach that improves performance across varied workloads.

The paper tackles the problem of designing approximate range filters by introducing Proteus, a self-designing filter that configures itself based on sampled data to optimize false positive rates for given space constraints, achieving up to 5.3x performance improvement in RocksDB over state-of-the-art methods.

We introduce Proteus, a novel self-designing approximate range filter, which configures itself based on sampled data in order to optimize its false positive rate (FPR) for a given space requirement. Proteus unifies the probabilistic and deterministic design spaces of state-of-the-art range filters to achieve robust performance across a larger variety of use cases. At the core of Proteus lies our Contextual Prefix FPR (CPFPR) model - a formal framework for the FPR of prefix-based filters across their design spaces. We empirically demonstrate the accuracy of our model and Proteus' ability to optimize over both synthetic workloads and real-world datasets. We further evaluate Proteus in RocksDB and show that it is able to improve end-to-end performance by as much as 5.3x over more brittle state-of-the-art methods such as SuRF and Rosetta. Our experiments also indicate that the cost of modeling is not significant compared to the end-to-end performance gains and that Proteus is robust to workload shifts.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes