DBApr 1

Making Array-Based Translation Practical for Modern, High-Performance Buffer Management

Xinjing Zhou, Jinming Hu, Andrew Pavlo, Michael Stonebraker

arXiv:2604.004234.8h-index: 4

Predicted impact top 92% in DB · last 90 daysOriginality Incremental advance

AI Analysis

This addresses the need for efficient buffer management in database systems handling mixed workloads, though it is incremental by reviving an old idea with new optimizations.

The paper tackled the problem of designing a buffer pool translation mechanism that supports diverse modern workloads like analytics and vector search, and presented Calico, which matches or outperforms state-of-the-art methods, delivering up to 3.9x in-memory and 6.5x larger-than-memory speedups for PostgreSQL vector search.

Modern buffer pools must now support a broader workload mix than classic OLTP alone. In addition to B-tree lookups, database systems increasingly serve scan-heavy analytics and vector-search indexes with irregular high-fan-out graph traversal access patterns. These workloads require a translation mechanism -- mapping logical page IDs to resident frames -- that is simultaneously fast across these diverse access patterns, deployable in user space,compatible with huge pages, easy to integrate, and still under DBMS control for eviction and I/O. Existing designs satisfy only subsets of these goals. This paper presents \textbf{\calico}, a practical DBMS-controlled buffer pool built around array-based translation, a decades-old-idea that was dissmissed but now viable with modern hardware. \calico decouples logical translation from OS page tables so that the DBMS can combine low-overhead translation with huge-page-backed frames and fine-grained page management. To make array translation practical and performant for DBMSes with large sparse hierarchical page identifiers, \calico introduces three techniques: multi-level translation with path caching, hole punching for reclaiming cold translation memory, and group prefetch to exploit parallelism. Our evaluation across scans, OLTP-style B-tree accesses, and vector search shows that \calico matches or outperforms the existing state-of-the-art in-memory and out-of-memory performance. We also implement \calico as a drop-in replacement for PostgreSQL's buffer manager and integrate it with \texttt{pgvector}. Across vector search, and scan-heavy workloads, \calico delivers up to 3.9$\times$ in-memory and 6.5$\times$ larger-than-memory speedup for PostgreSQL vector search, speeds up scan-heavy queries by up to 3$\times$.

View on arXiv PDF

Similar