TraceBack: Multi-Agent Decomposition for Fine-Grained Table Attribution
This work addresses the need for verifiable grounding in table-based QA, which is crucial for high-stakes applications, though it is incremental in improving existing methods with new evaluation tools.
The paper tackles the problem of providing fine-grained, cell-level attribution for answers in table question answering to enhance transparency and trust, and introduces TraceBack, a multi-agent framework that outperforms baselines across datasets while also proposing FairScore, a metric that closely aligns with human judgments for evaluating attribution.
Question answering (QA) over structured tables requires not only accurate answers but also transparency about which cells support them. Existing table QA systems rarely provide fine-grained attribution, so even correct answers often lack verifiable grounding, limiting trust in high-stakes settings. We address this with TraceBack, a modular multi-agent framework for scalable, cell-level attribution in single-table QA. TraceBack prunes tables to relevant rows and columns, decomposes questions into semantically coherent sub-questions, and aligns each answer span with its supporting cells, capturing both explicit and implicit evidence used in intermediate reasoning steps. To enable systematic evaluation, we release CITEBench, a benchmark with phrase-to-cell annotations drawn from ToTTo, FetaQA, and AITQA. We further propose FairScore, a reference-less metric that compares atomic facts derived from predicted cells and answers to estimate attribution precision and recall without human cell labels. Experiments show that TraceBack substantially outperforms strong baselines across datasets and granularities, while FairScore closely tracks human judgments and preserves relative method rankings, supporting interpretable and scalable evaluation of table-based QA.