dIR -- Discrete Information Retrieval: Conversational Search over Unstructured (and Structured) Data with Large Language Models
This addresses the problem of unified data querying for conversational search, though it appears incremental as it builds on existing LLM and text-to-SQL techniques.
The paper tackles the challenge of querying both structured and unstructured data for natural language conversations by introducing dIR (Discrete Information Retrieval), which uses Large Language Models to transform text into columnar form and convert natural language to SQL, enabling a new class of queries that traditional methods cannot handle.
Data is stored in both structured and unstructured form. Querying both, to power natural language conversations, is a challenge. This paper introduces dIR, Discrete Information Retrieval, providing a unified interface to query both free text and structured knowledge. Specifically, a Large Language Model (LLM) transforms text into expressive representation. After the text is extracted into columnar form, it can then be queried via a text-to-SQL Semantic Parser, with an LLM converting natural language into SQL. Where desired, such conversation may be effected by a multi-step reasoning conversational agent. We validate our approach via a proprietary question/answer data set, concluding that dIR makes a whole new class of queries on free text possible when compared to traditionally fine-tuned dense-embedding-model-based Information Retrieval (IR) and SQL-based Knowledge Bases (KB). For sufficiently complex queries, dIR can succeed where no other method stands a chance.