DBAIMar 6, 2025

PDX: A Data Layout for Vector Similarity Search

arXiv:2503.04422v113 citationsh-index: 4Proc. ACM Manag. Data
Originality Incremental advance
AI Analysis

This addresses efficiency challenges in vector databases and similarity search applications, offering incremental improvements over existing layouts and pruning methods.

The paper tackles the problem of accelerating vector similarity search by proposing PDX, a data layout that stores multiple vectors in a block with a vertical layout for dimensions, resulting in a 40% speedup over SIMD-optimized distance kernels and restoring 2-7x benefits when combined with dimension-pruning algorithms.

We propose Partition Dimensions Across (PDX), a data layout for vectors (e.g., embeddings) that, similar to PAX [6], stores multiple vectors in one block, using a vertical layout for the dimensions (Figure 1). PDX accelerates exact and approximate similarity search thanks to its dimension-by-dimension search strategy that operates on multiple-vectors-at-a-time in tight loops. It beats SIMD-optimized distance kernels on standard horizontal vector storage (avg 40% faster), only relying on scalar code that gets auto-vectorized. We combined the PDX layout with recent dimension-pruning algorithms ADSampling [19] and BSA [52] that accelerate approximate vector search. We found that these algorithms on the horizontal vector layout can lose to SIMD-optimized linear scans, even if they are SIMD-optimized. However, when used on PDX, their benefit is restored to 2-7x. We find that search on PDX is especially fast if a limited number of dimensions has to be scanned fully, which is what the dimension-pruning approaches do. We finally introduce PDX-BOND, an even more flexible dimension-pruning strategy, with good performance on exact search and reasonable performance on approximate search. Unlike previous pruning algorithms, it can work on vector data "as-is" without preprocessing; making it attractive for vector databases with frequent updates.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes