CLJun 25, 2025

Towards Probabilistic Question Answering Over Tabular Data

arXiv:2506.20747v11 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses a gap in question answering for uncertain reasoning over tables, which is incremental as it builds on hybrid symbolic-neural approaches.

The paper tackles the problem of answering probabilistic questions over tabular data, where existing methods like NL2SQL systems fail, by introducing a new benchmark LUCARIO and a framework that uses Bayesian Networks and LLMs, resulting in significant improvements over baselines.

Current approaches for question answering (QA) over tabular data, such as NL2SQL systems, perform well for factual questions where answers are directly retrieved from tables. However, they fall short on probabilistic questions requiring reasoning under uncertainty. In this paper, we introduce a new benchmark LUCARIO and a framework for probabilistic QA over large tabular data. Our method induces Bayesian Networks from tables, translates natural language queries into probabilistic queries, and uses large language models (LLMs) to generate final answers. Empirical results demonstrate significant improvements over baselines, highlighting the benefits of hybrid symbolic-neural reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes