DBAILGMLJun 23, 2024

UQE: A Query Engine for Unstructured Databases

arXiv:2407.09522v230 citations
Originality Highly original
AI Analysis

This addresses the challenge of performing analytics on unstructured data, which is prevalent in real-world applications, by introducing a novel query engine that leverages LLMs, representing a significant step beyond traditional structured data methods.

The paper tackles the problem of analyzing unstructured data like images and conversations by proposing a Universal Query Engine (UQE) that uses Large Language Models (LLMs) to enable efficient and accurate query execution across various modalities, demonstrating its effectiveness on tasks such as conditional aggregation and semantic retrieval.

Analytics on structured data is a mature field with many successful methods. However, most real world data exists in unstructured form, such as images and conversations. We investigate the potential of Large Language Models (LLMs) to enable unstructured data analytics. In particular, we propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections. This engine accepts queries in a Universal Query Language (UQL), a dialect of SQL that provides full natural language flexibility in specifying conditions and operators. The new engine leverages the ability of LLMs to conduct analysis of unstructured data, while also allowing us to exploit advances in sampling and optimization techniques to achieve efficient and accurate query execution. In addition, we borrow techniques from classical compiler theory to better orchestrate the workflow between sampling methods and foundation model calls. We demonstrate the efficiency of UQE on data analytics across different modalities, including images, dialogs and reviews, across a range of useful query types, including conditional aggregation, semantic retrieval and abstraction aggregation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes