DBOSApr 10

Decoupling Vector Data and Index Storage for Space Efficiency

arXiv:2604.0917341.6
AI Analysis

This addresses storage overhead and performance issues for large-scale vector datasets in ANNS systems, representing an incremental improvement.

The paper tackled the storage inefficiency in disk-based approximate nearest neighbor search systems by decoupling vector data and index metadata, resulting in up to 58.7% storage reduction while maintaining competitive performance.

Managing large-scale vector datasets with disk-based approximate nearest neighbor search (ANNS) systems faces critical efficiency challenges stemming from the co-location of vector data and auxiliary index metadata. Our analysis of state-of-the-art ANNS systems reveals that such co-location incurs substantial storage overhead, generates excessive reads during search queries, and causes severe write amplification during updates. We present DecoupleVS, a decoupled vector storage management framework that enables specialized optimizations for vector data and auxiliary index metadata. DecoupleVS incorporates various design techniques for effective compression, data layouts, search queries, and updates, so as to significantly reduce storage space, while maintaining high search and update performance and high search accuracy. Evaluation on real-world public and proprietary billion-scale datasets shows that DecoupleVS reduces storage space by up to 58.7\%, while delivering competitive or improved search query and update performance, compared to state-of-the-art monolithic disk-based ANNS systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes