DBLGOCFeb 7, 2025

DobLIX: A Dual-Objective Learned Index for Log-Structured Merge Trees

arXiv:2502.05369v24 citationsh-index: 2Proc VLDB Endow
Originality Highly original
AI Analysis

This work addresses a problem for developers and users of key-value stores, particularly those using LSM-tree-based systems, by providing an incremental improvement in indexing and data access efficiency.

DobLIX tackles the problem of optimizing index lookups and data access in Log-Structured Merge tree-based key-value stores, resulting in a 1.19 to 2.21 times improvement in throughput compared to state-of-the-art methods. The approach ensures both index lookup efficiency and data access costs are minimized.

In this paper, we introduce DobLIX, a dual-objective learned index specifically designed for Log-Structured Merge(LSM) tree-based key-value stores. Although traditional learned indexes focus exclusively on optimizing index lookups, they often overlook the impact of data access from storage, resulting in performance bottlenecks. DobLIX addresses this by incorporating a second objective, data access optimization, into the learned index training process. This dual-objective approach ensures that both index lookup efficiency and data access costs are minimized, leading to significant improvements in read performance while maintaining write efficiency in real-world LSM-tree systems. Additionally, DobLIX features a reinforcement learning agent that dynamically tunes the system parameters, allowing it to adapt to varying workloads in real-time. Experimental results using real-world datasets demonstrate that DobLIX reduces indexing overhead and improves throughput by 1.19 to 2.21 times compared to state-of-the-art methods within RocksDB, a widely used LSM-tree-based storage engine.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes