A Model for Learned Bloom Filters, and Optimizing by Sandwiching
This work addresses performance improvements for data structures like Bloom filters, but it appears incremental as it builds on recent suggestions without introducing a new paradigm.
The paper tackles the problem of enhancing Bloom filters by using machine learning to model the dataset, resulting in a model that clarifies performance guarantees, provides size estimation for learning functions, and introduces a sandwiching method for optimization.
Recent work has suggested enhancing Bloom filters by using a pre-filter, based on applying machine learning to determine a function that models the data set the Bloom filter is meant to represent. Here we model such learned Bloom filters,, with the following outcomes: (1) we clarify what guarantees can and cannot be associated with such a structure; (2) we show how to estimate what size the learning function must obtain in order to obtain improved performance; (3) we provide a simple method, sandwiching, for optimizing learned Bloom filters; and (4) we propose a design and analysis approach for a learned Bloomier filter, based on our modeling approach.