CL LGDec 16, 2024

Interpretable LLM-based Table Question Answering

Giang Nguyen, Ivan Brugere, Shubham Sharma, Sanjay Kariyappa, Anh Totti Nguyen, Freddy Lecue

arXiv:2412.12386v312.225 citationsh-index: 7Trans. Mach. Learn. Res.

Originality Incremental advance

AI Analysis

This addresses the need for interpretable Table QA in high-stakes domains like finance and healthcare, offering an incremental improvement over existing methods.

The paper tackles the problem of interpretability in Table Question Answering (Table QA) by proposing Plan-of-SQLs (POS), a method that decomposes questions into atomic SQL steps for transparent decision-making. The results show POS achieves competitive QA accuracy on benchmarks like TabFact and WikiTQ while requiring up to 25x fewer LLM calls and table queries, with high agreement (up to 90.59%) between LLMs and humans in evaluating explanations.

Interpretability in Table Question Answering (Table QA) is critical, especially in high-stakes domains like finance and healthcare. While recent Table QA approaches based on Large Language Models (LLMs) achieve high accuracy, they often produce ambiguous explanations of how answers are derived. We propose Plan-of-SQLs (POS), a new Table QA method that makes the model's decision-making process interpretable. POS decomposes a question into a sequence of atomic steps, each directly translated into an executable SQL command on the table, thereby ensuring that every intermediate result is transparent. Through extensive experiments, we show that: First, POS generates the highest-quality explanations among compared methods, which markedly improves the users' ability to simulate and verify the model's decisions. Second, when evaluated on standard Table QA benchmarks (TabFact, WikiTQ, and FeTaQA), POS achieves QA accuracy that is competitive to existing methods, while also offering greater efficiency-requiring significantly fewer LLM calls and table database queries (up to 25x fewer)-and more robust performance on large-sized tables. Finally, we observe high agreement (up to 90.59% in forward simulation) between LLMs and human users when making decisions based on the same explanations, suggesting that LLMs could serve as an effective proxy for humans in evaluating Table QA explanations.

View on arXiv PDF

Similar