CLApr 9, 2025

NeedleInATable: Exploring Long-Context Capability of Large Language Models towards Long-Structured Tables

arXiv:2504.06560v49 citationsh-index: 20
Originality Incremental advance
AI Analysis

This work addresses the problem of assessing genuine long-context table understanding for LLM developers and researchers, though it is incremental as it builds on existing needle-in-a-haystack concepts for structured data.

The authors tackled the challenge of evaluating large language models' ability to understand long structured tables by introducing the NeedleInATable benchmark, which revealed a substantial performance gap between popular downstream tasks and simpler cell extraction, suggesting models may rely on shortcuts rather than robust understanding.

Processing structured tabular data, particularly large and lengthy tables, constitutes a fundamental yet challenging task for large language models (LLMs). However, existing long-context benchmarks like Needle-in-a-Haystack primarily focus on unstructured text, neglecting the challenge of diverse structured tables. Meanwhile, previous tabular benchmarks mainly consider downstream tasks that require high-level reasoning abilities, and overlook models' underlying fine-grained perception of individual table cells, which is crucial for practical and robust LLM-based table applications. To address this gap, we introduce \textsc{NeedleInATable} (NIAT), a new long-context tabular benchmark that treats each table cell as a ``needle'' and requires models to extract the target cell based on cell locations or lookup questions. Our comprehensive evaluation of various LLMs and multimodal LLMs reveals a substantial performance gap between popular downstream tabular tasks and the simpler NIAT task, suggesting that they may rely on dataset-specific correlations or shortcuts to obtain better benchmark results but lack truly robust long-context understanding towards structured tables. Furthermore, we demonstrate that using synthesized NIAT training data can effectively improve performance on both NIAT task and downstream tabular tasks, which validates the importance of NIAT capability for LLMs' genuine table understanding ability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes