ARCADE: A Real-Time Data System for Hybrid and Continuous Query Processing across Diverse Data Modalities
This addresses the need for efficient real-time analytics across diverse data modalities, representing an incremental improvement over existing multimodal and real-time database systems.
The paper tackles the problem of real-time semantic search and retrieval over multimodal data by introducing ARCADE, a system that supports high-throughput ingestion and hybrid continuous query processing, outperforming leading multimodal systems by up to 7.4x on read-heavy and 1.4x on write-heavy workloads.
The explosive growth of multimodal data - spanning text, image, video, spatial, and relational modalities, coupled with the need for real-time semantic search and retrieval over these data - has outpaced the capabilities of existing multimodal and real-time database systems, which either lack efficient ingestion and continuous query capability, or fall short in supporting expressive hybrid analytics. We introduce ARCADE, a real-time data system that efficiently supports high-throughput ingestion and expressive hybrid and continuous query processing across diverse data types. ARCADE introduces unified disk-based secondary index on LSM-based storage for vector, spatial, and text data modalities, a comprehensive cost-based query optimizer for hybrid queries, and an incremental materialized view framework for efficient continuous queries. Built on open-source RocksDB storage and MySQL query engine, ARCADE outperforms leading multimodal data systems by up to 7.4x on read-heavy and 1.4x on write-heavy workloads.