DBMay 10

LDI: Localized Data Imputation for Text-Rich Tables

arXiv:2506.1661638.51 citationsh-index: 6Has Code
Predicted impact top 41% in DB · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners dealing with missing data in text-rich tables, LDI offers a more accurate, scalable, and interpretable imputation method, particularly beneficial in high-stakes applications.

LDI introduces a localized imputation framework for text-rich tables that selects a compact, contextually relevant subset of data for each missing value, achieving up to 8% higher accuracy than state-of-the-art methods with hosted LLMs and even greater gains with small local models.

Missing values are pervasive in real-world tabular data and can significantly impair downstream analysis. Imputing them is especially challenging in text-rich tables, where dependencies are implicit, complex, and dispersed across long textual fields. Recent work has explored using Large Language Models (LLMs) for data imputation, yet existing approaches typically process entire tables or loosely related contexts, which can compromise accuracy, scalability, and explainability. We introduce LDI, a novel framework that leverages LLMs through localized reasoning, selecting a compact, contextually relevant subset of attributes and tuples for each missing value. This targeted selection reduces noise, improves scalability, and provides transparent attribution by revealing the dependency relations that justify each selected attribute and the evidence behind each retrieved tuple. It makes clear not only which data influenced a prediction, but also why it was chosen. Through extensive experiments on real and synthetic datasets, we demonstrate that LDI consistently outperforms state-of-the-art imputation methods, achieving up to 8% higher accuracy with hosted LLMs and even greater gains with small local models. The improved interpretability and robustness also make LDI well-suited for high-stakes data management applications. Our code and datasets are publicly available at https://github.com/soroushomidvar/LDI.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes