DBAIOct 18, 2023

A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge

arXiv:2310.11703v2122 citationsh-index: 5
Originality Synthesis-oriented
AI Analysis

This is an incremental work that synthesizes existing knowledge to serve as a practical resource for researchers and practitioners in AI and database management.

The paper provides a comprehensive survey of vector databases, reviewing storage and retrieval techniques, comparing advanced solutions, and outlining emerging opportunities with large language models.

Vector databases (VDBs) have emerged to manage high-dimensional data that exceed the capabilities of traditional database management systems, and are now tightly integrated with large language models as well as widely applied in modern artificial intelligence systems. Although relatively few studies describe existing or introduce new vector database architectures, the core technologies underlying VDBs, such as approximate nearest neighbor search, have been extensively studied and are well documented in the literature. In this work, we present a comprehensive review of the relevant algorithms to provide a general understanding of this booming research area. Specifically, we first provide a review of storage and retrieval techniques in VDBs, with detailed design principles and technological evolution. Then, we conduct an in-depth comparison of several advanced VDB solutions with their strengths, limitations, and typical application scenarios. Finally, we also outline emerging opportunities for coupling VDBs with large language models, including open research problems and trends, such as novel indexing strategies. This survey aims to serve as a practical resource, enabling readers to quickly gain an overall understanding of the current knowledge landscape in this rapidly developing area.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes