Cascaded Learned Bloom Filter for Optimal Model-Filter Size Balance and Fast Rejection
This work addresses efficiency challenges in learned Bloom filters, which are incremental improvements for data structures used in databases and networking.
The paper tackles the suboptimal balance between model and filter sizes and slow rejection in learned Bloom filters by proposing the Cascaded Learned Bloom Filter (CLBF), which reduces memory usage by up to 24% and decreases reject time by up to 14 times compared to state-of-the-art methods.
Recent studies have demonstrated that learned Bloom filters, which combine machine learning with the classical Bloom filter, can achieve superior memory efficiency. However, existing learned Bloom filters face two critical unresolved challenges: the balance between the machine learning model size and the Bloom filter size is not optimal, and the reject time cannot be minimized effectively. We propose the Cascaded Learned Bloom Filter (CLBF) to address these issues. Our dynamic programming-based optimization automatically selects configurations that achieve an optimal balance between the model and filter sizes while minimizing reject time. Experiments on real-world datasets show that CLBF reduces memory usage by up to 24% and decreases reject time by up to 14 times compared to state-of-the-art learned Bloom filters.