CLSep 20, 2023

Localize, Retrieve and Fuse: A Generalized Framework for Free-Form Question Answering over Tables

Wenting Zhao, Ye Liu, Yao Wan, Yibo Wang, Zhongfen Deng, Philip S. Yu

Salesforce

arXiv:2309.11049v221.2128 citationsh-index: 167

Originality Incremental advance

AI Analysis

This addresses the challenge of generating coherent and faithful free-form answers from tabular data, which is important for applications like data analysis and natural language interfaces, though it is incremental as it builds on existing TableQA methods.

The paper tackles the problem of free-form question answering over tables, which requires reasoning across diverse cells to generate long answers, by proposing a three-stage approach called TAG-QA that localizes cells, retrieves external knowledge, and fuses information. The result shows TAG-QA outperforms state-of-the-art baselines, with improvements of 17% and 14% in BLEU-4 and PARENT F-score over TAPAS, and 16% and 12% over T5.

Question answering on tabular data (a.k.a TableQA), which aims at generating answers to questions grounded on a provided table, has gained significant attention recently. Prior work primarily produces concise factual responses through information extraction from individual or limited table cells, lacking the ability to reason across diverse table cells. Yet, the realm of free-form TableQA, which demands intricate strategies for selecting relevant table cells and the sophisticated integration and inference of discrete data fragments, remains mostly unexplored. To this end, this paper proposes a generalized three-stage approach: Table-to- Graph conversion and cell localizing, external knowledge retrieval, and the fusion of table and text (called TAG-QA), to address the challenge of inferring long free-form answers in generative TableQA. In particular, TAG-QA (1) locates relevant table cells using a graph neural network to gather intersecting cells between relevant rows and columns, (2) leverages external knowledge from Wikipedia, and (3) generates answers by integrating both tabular data and natural linguistic information. Experiments showcase the superior capabilities of TAG-QA in generating sentences that are both faithful and coherent, particularly when compared to several state-of-the-art baselines. Notably, TAG-QA surpasses the robust pipeline-based baseline TAPAS by 17% and 14% in terms of BLEU-4 and PARENT F-score, respectively. Furthermore, TAG-QA outperforms the end-to-end model T5 by 16% and 12% on BLEU-4 and PARENT F-score, respectively.

View on arXiv PDF

Similar