DB LGJun 25, 2025

Piecewise Linear Approximation in Learned Index Structures: Theoretical and Empirical Analysis

Jiayong Qin, Xianyu Zhu, Qiyu Liu, Guangyi Zhang, Zhigang Cai, Jianwei Liao, Sha Hu, Jingshu Peng, Yingxia Shao, Lei Chen

arXiv:2506.20139v11.2h-index: 11

Originality Incremental advance

AI Analysis

This work addresses a key bottleneck in learned data structures for databases and systems, offering incremental theoretical and empirical insights to guide future designs.

The paper tackles the underexplored design and analysis of error-bounded Piecewise Linear Approximation (ε-PLA) algorithms in learned index structures, establishing an improved lower bound of Ω(κ·ε²) on expected segment coverage and benchmarking state-of-the-art algorithms to highlight trade-offs in model accuracy, size, and query performance.

A growing trend in the database and system communities is to augment conventional index structures, such as B+-trees, with machine learning (ML) models. Among these, error-bounded Piecewise Linear Approximation ($ε$-PLA) has emerged as a popular choice due to its simplicity and effectiveness. Despite its central role in many learned indexes, the design and analysis of $ε$-PLA fitting algorithms remain underexplored. In this paper, we revisit $ε$-PLA from both theoretical and empirical perspectives, with a focus on its application in learned index structures. We first establish a fundamentally improved lower bound of $Ω(κ\cdot ε^2)$ on the expected segment coverage for existing $ε$-PLA fitting algorithms, where $κ$ is a data-dependent constant. We then present a comprehensive benchmark of state-of-the-art $ε$-PLA algorithms when used in different learned data structures. Our results highlight key trade-offs among model accuracy, model size, and query performance, providing actionable guidelines for the principled design of future learned data structures.

View on arXiv PDF

Similar