AINov 4, 2025

TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data

arXiv:2511.02219v25 citationsh-index: 5EMNLP
Originality Incremental advance
AI Analysis

This addresses the challenge of improving large language models for complex tabular numerical reasoning in data analysis, though it is incremental as it builds on existing methods like program-of-thoughts.

The paper tackles the problem of complex numerical reasoning over tabular data, where large language models often underperform, by proposing TabDSR, a framework that decomposes queries, sanitizes tables, and uses program-of-thoughts reasoning, achieving state-of-the-art performance with accuracy improvements of 8.79%, 6.08%, and 19.87% on benchmark datasets.

Complex reasoning over tabular data is crucial in real-world data analysis, yet large language models (LLMs) often underperform due to complex queries, noisy data, and limited numerical capabilities. To address these issues, we propose TabDSR, a framework consisting of: (1) a query decomposer that breaks down complex questions, (2) a table sanitizer that cleans and filters noisy tables, and (3) a program-of-thoughts (PoT)-based reasoner that generates executable code to derive the final answer from the sanitized table. To ensure unbiased evaluation and mitigate data leakage, we introduce a new dataset, CalTab151, specifically designed for complex numerical reasoning over tables. Experimental results demonstrate that TabDSR consistently outperforms existing methods, achieving state-of-the-art (SOTA) performance with 8.79%, 6.08%, and 19.87% accuracy improvement on TAT-QA, TableBench, and TabDSR, respectively. Moreover, our framework integrates seamlessly with mainstream LLMs, providing a robust solution for complex tabular numerical reasoning. These findings highlight the effectiveness of our framework in enhancing LLM performance for complex tabular numerical reasoning. Data and code are available upon request.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes