CLJun 4, 2025

Relationship Detection on Tabular Data Using Statistical Analysis and Large Language Models

arXiv:2506.06371v21 citationsh-index: 19
Originality Incremental advance
AI Analysis

This work addresses the task of column pair annotation for tabular data interpretation, which is incremental as it builds on existing methods by integrating statistical constraints with LLMs.

The paper tackles the problem of detecting relationships among columns in unlabeled tabular data using a hybrid approach that combines large language models with statistical analysis to reduce search space, achieving competitive results on benchmark datasets from the SemTab challenge.

Over the past few years, table interpretation tasks have made significant progress due to their importance and the introduction of new technologies and benchmarks in the field. This work experiments with a hybrid approach for detecting relationships among columns of unlabeled tabular data, using a Knowledge Graph (KG) as a reference point, a task known as CPA. This approach leverages large language models (LLMs) while employing statistical analysis to reduce the search space of potential KG relations. The main modules of this approach for reducing the search space are domain and range constraints detection, as well as relation co-appearance analysis. The experimental evaluation on two benchmark datasets provided by the SemTab challenge assesses the influence of each module and the effectiveness of different state-of-the-art LLMs at various levels of quantization. The experiments were performed, as well as at different prompting techniques. The proposed methodology, which is publicly available on github, proved to be competitive with state-of-the-art approaches on these datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes