LGMar 14, 2025

RePanda: Pandas-powered Tabular Verification and Reasoning

Atoosa Malemir Chegini, Keivan Rezaei, Hamid Eghbalzadeh, Soheil Feizi

arXiv:2503.11921v26 citationsh-index: 49ACL

Originality Incremental advance

AI Analysis

This addresses the need for transparent and verifiable fact verification in structured data, though it is incremental by building on existing datasets and models.

The paper tackled the problem of fact-checking tabular data by introducing RePanda, which translates claims into executable pandas queries for interpretable reasoning, achieving 84.09% accuracy on TabFact and 84.72% on an OOD dataset.

Fact-checking tabular data is essential for ensuring the accuracy of structured information. However, existing methods often rely on black-box models with opaque reasoning. We introduce RePanda, a structured fact verification approach that translates claims into executable pandas queries, enabling interpretable and verifiable reasoning. To train RePanda, we construct PanTabFact, a structured dataset derived from the TabFact train set, where claims are paired with executable queries generated using DeepSeek-Chat and refined through automated error correction. Fine-tuning DeepSeek-coder-7B-instruct-v1.5 on PanTabFact, RePanda achieves 84.09% accuracy on the TabFact test set. To evaluate Out-of-Distribution (OOD) generalization, we interpret question-answer pairs from WikiTableQuestions as factual claims and refer to this dataset as WikiFact. Without additional fine-tuning, RePanda achieves 84.72% accuracy on WikiFact, significantly outperforming all other baselines and demonstrating strong OOD robustness. Notably, these results closely match the zero-shot performance of DeepSeek-Chat (671B), indicating that our fine-tuning approach effectively distills structured reasoning from a much larger model into a compact, locally executable 7B model. Beyond fact verification, RePanda extends to tabular question answering by generating executable queries that retrieve precise answers. To support this, we introduce PanWiki, a dataset mapping WikiTableQuestions to pandas queries. Fine-tuning on PanWiki, RePanda achieves 75.1% accuracy in direct answer retrieval. These results highlight the effectiveness of structured execution-based reasoning for tabular verification and question answering. We have publicly released the dataset on Hugging Face at datasets/AtoosaChegini/PanTabFact.

View on arXiv PDF

Similar