CLAug 17, 2024

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

arXiv:2408.09174v2152 citationsh-index: 18Has Code
AI Analysis

This work addresses the challenge of applying LLMs to complex industrial table data, though it is incremental as it builds on existing benchmarks and models.

The authors tackled the gap between academic benchmarks and real-world table question answering by creating TableBench, a comprehensive benchmark with 18 fields across four categories, and introduced TableLLM, which achieved performance comparable to GPT-3.5.

Recent advancements in Large Language Models (LLMs) have markedly enhanced the interpretation and processing of tabular data, introducing previously unimaginable capabilities. Despite these achievements, LLMs still encounter significant challenges when applied in industrial scenarios, particularly due to the increased complexity of reasoning required with real-world tabular data, underscoring a notable disparity between academic benchmarks and practical applications. To address this discrepancy, we conduct a detailed investigation into the application of tabular data in industrial scenarios and propose a comprehensive and complex benchmark TableBench, including 18 fields within four major categories of table question answering (TableQA) capabilities. Furthermore, we introduce TableLLM, trained on our meticulously constructed training set TableInstruct, achieving comparable performance with GPT-3.5. Massive experiments conducted on TableBench indicate that both open-source and proprietary LLMs still have significant room for improvement to meet real-world demands, where the most advanced model, GPT-4, achieves only a modest score compared to humans.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes