CLMay 13, 2024

TANQ: An open domain dataset of table answered questions

Mubashara Akhtar, Chenxi Pang, Andreea Marzoca, Yasemin Altun, Julian Martin Eisenschlos

DeepMind

arXiv:2405.07765v36.67 citationsh-index: 8Has CodeTACL

Originality Incremental advance

AI Analysis

This dataset addresses the problem of complex, multi-source question answering for AI researchers, though it is incremental as it builds on existing QA tasks by adding table generation.

The authors introduced TANQ, the first open-domain question answering dataset requiring table construction from multiple sources, and benchmarked state-of-the-art language models, with the best baseline achieving an F1 score of 60.7, lagging 12.3 points behind human performance.

Language models, potentially augmented with tool usage such as retrieval are becoming the go-to means of answering questions. Understanding and answering questions in real-world settings often requires retrieving information from different sources, processing and aggregating data to extract insights, and presenting complex findings in form of structured artifacts such as novel tables, charts, or infographics. In this paper, we introduce TANQ, the first open domain question answering dataset where the answers require building tables from information across multiple sources. We release the full source attribution for every cell in the resulting table and benchmark state-of-the-art language models in open, oracle, and closed book setups. Our best-performing baseline, Gemini Flash reaches an overall F1 score of 60.7, lagging behind human performance by 12.3 points. We analyse baselines' performance across different dataset attributes such as different skills required for this task, including multi-hop reasoning, math operations, and unit conversions. We further discuss common failures in model-generated answers, suggesting that TANQ is a complex task with many challenges ahead.

View on arXiv PDF Code

Similar