DBCLMay 25, 2025

SQUiD: Synthesizing Relational Databases from Unstructured Text

arXiv:2505.19025v12 citationsh-index: 10EMNLP
Originality Incremental advance
AI Analysis

This addresses the challenge of managing unstructured data for database practitioners, though it appears incremental as it builds on existing LLM and neurosymbolic methods.

The paper tackles the problem of converting unstructured text into relational databases by introducing SQUiD, a neurosymbolic framework that uses large language models to generate schemas and populate tables, achieving consistent performance improvements over baselines across diverse datasets.

Relational databases are central to modern data management, yet most data exists in unstructured forms like text documents. To bridge this gap, we leverage large language models (LLMs) to automatically synthesize a relational database by generating its schema and populating its tables from raw text. We introduce SQUiD, a novel neurosymbolic framework that decomposes this task into four stages, each with specialized techniques. Our experiments show that SQUiD consistently outperforms baselines across diverse datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes