DSDBMay 27

Residual-Entropy Accounting for Routed Atom-Budgeted Learned Indexes

arXiv:2605.2906123.6h-index: 1
Predicted impact top 45% in DS · last 90 daysOriginality Synthesis-oriented
AI Analysis

For researchers in learned indexes, this provides a theoretical accounting framework for a specific architecture, but the results are scoped and incremental, with no claimed speed improvements.

The paper studies exact predecessor and rank search in a routed, atom-budgeted, certified-repair learned-index architecture, proving a two-sided accounting theorem for conditional residual answer entropy. Benchmarks on real datasets show overheads and bottlenecks compared to PGM-index, RadixSpline, and binary search.

We study exact predecessor and rank search in a routed, atom-budgeted, certified-repair learned-index architecture. An ordered directory routes each query to a contiguous interval, a counted local predictor returns a certified rank window, and exact repair resolves the remaining uncertainty by comparisons. The result is scoped to this architecture and does not claim guarantees for arbitrary learned-index designs such as unconstrained RMI dispatch, hash routing, neural routing, or exact-payload indexes without additional accounting. The main parameter is conditional residual answer entropy: the entropy of the exact answer after the leaf, predictor output, certificate, and charged pre-repair information are observed. We prove a two-sided accounting theorem showing that this functional gives the query-time scale under the stated architecture and local predictor-atom budget. Directory space, sorted-array storage, and transcript-indexed repair-program space are treated as separate system costs, so the theorem is not a byte-level space lower bound or a compact implementation recipe. We also give a rank-spread specialization in which the radius term log(1 + Delta) is valid only when many residual ranks remain likely after the predictor transcript is known. For counted piecewise-linear segments, we make the profile term non-oracular, derive a shadow-price allocation rule, compute finite-instance RGapM and GapM values on real SOSD and Zenodo samples, and report benchmarks against PGM-index, RadixSpline, and binary search. The benchmarks expose overheads and bottlenecks rather than claiming speed for the shadow prototype.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes