DBLGJun 25, 2025

Piecewise Linear Approximation in Learned Index Structures: Theoretical and Empirical Analysis

arXiv:2506.20139v1h-index: 11
Originality Incremental advance
AI Analysis

This work addresses a key bottleneck in learned data structures for databases and systems, offering incremental theoretical and empirical insights to guide future designs.

The paper tackles the underexplored design and analysis of error-bounded Piecewise Linear Approximation (ε-PLA) algorithms in learned index structures, establishing an improved lower bound of Ω(κ·ε²) on expected segment coverage and benchmarking state-of-the-art algorithms to highlight trade-offs in model accuracy, size, and query performance.

A growing trend in the database and system communities is to augment conventional index structures, such as B+-trees, with machine learning (ML) models. Among these, error-bounded Piecewise Linear Approximation ($ε$-PLA) has emerged as a popular choice due to its simplicity and effectiveness. Despite its central role in many learned indexes, the design and analysis of $ε$-PLA fitting algorithms remain underexplored. In this paper, we revisit $ε$-PLA from both theoretical and empirical perspectives, with a focus on its application in learned index structures. We first establish a fundamentally improved lower bound of $Ω(κ\cdot ε^2)$ on the expected segment coverage for existing $ε$-PLA fitting algorithms, where $κ$ is a data-dependent constant. We then present a comprehensive benchmark of state-of-the-art $ε$-PLA algorithms when used in different learned data structures. Our results highlight key trade-offs among model accuracy, model size, and query performance, providing actionable guidelines for the principled design of future learned data structures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes