DBApr 29

PiLLar: Matching for Pivot Table Schema via LLM-guided Monte-Carlo Tree Search

arXiv:2604.2635675.3
AI Analysis

This work addresses the practical need for accurate schema matching in data lakes with anonymized data, offering a privacy-compliant solution for data integration.

PiLLar introduces the first framework for matching pivot table schemas to relational tables, a novel joint schema-value matching task. Using LLM-guided Monte-Carlo Tree Search, it achieves 87.94% accuracy on correctly predicted matches on a new benchmark PTbench, with training-free adaptation across domains.

Pivot tables are ubiquitous in data lakes of modern data ecosystems, making accurate schema matching over pivot tables a key prerequisite for data integration. In this paper, we focus on matching for pivot table schema, which is a novel joint schema-value matching task. It aims to align schemas between pivot tables and standard relational tables, where a correct match must be semantically consistent at the schema level and compatible at the value level. However, due to the inherent data sensitivity of this task, the prevalence of anonymized data in practice poses significant challenges to its matching accuracy and generalization capability. To tackle these challenges, we propose PiLLar, the first matching for pivot table schema framework. We first formulate PiLLar as an LLM-driven search paradigm that operates with minimal annotated privacy-compliant data, thereby achieving training-free adaptation across diverse domains. Next, we provide a theoretical analysis on the error dynamics of the paradigm to ensure the asymptotic convergence of the proposed method. Furthermore, we introduce a new benchmark PTbench, derived from four representative real-world domains and constructed by mining unpivot-suitable tables, performing unpivot on semantically coherent attributes, and applying sampling and anonymization. Extensive experiments demonstrate the superiority of PiLLar, which achieves an average accuracy of 87.94% on the correctly predicted matches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes