Piecewise Linear Approximation in Learned Index Structures: Theoretical and Empirical Analysis
This work addresses a key bottleneck in learned data structures for databases and systems, offering incremental theoretical and empirical insights to guide future designs.
The paper tackles the underexplored design and analysis of error-bounded Piecewise Linear Approximation (ε-PLA) algorithms in learned index structures, establishing an improved lower bound of Ω(κ·ε²) on expected segment coverage and benchmarking state-of-the-art algorithms to highlight trade-offs in model accuracy, size, and query performance.
A growing trend in the database and system communities is to augment conventional index structures, such as B+-trees, with machine learning (ML) models. Among these, error-bounded Piecewise Linear Approximation ($ε$-PLA) has emerged as a popular choice due to its simplicity and effectiveness. Despite its central role in many learned indexes, the design and analysis of $ε$-PLA fitting algorithms remain underexplored. In this paper, we revisit $ε$-PLA from both theoretical and empirical perspectives, with a focus on its application in learned index structures. We first establish a fundamentally improved lower bound of $Ω(κ\cdot ε^2)$ on the expected segment coverage for existing $ε$-PLA fitting algorithms, where $κ$ is a data-dependent constant. We then present a comprehensive benchmark of state-of-the-art $ε$-PLA algorithms when used in different learned data structures. Our results highlight key trade-offs among model accuracy, model size, and query performance, providing actionable guidelines for the principled design of future learned data structures.