CVCLSep 25, 2025

TABLET: A Large-Scale Dataset for Robust Visual Table Understanding

arXiv:2509.21205v23 citationsh-index: 86
Originality Synthesis-oriented
AI Analysis

This addresses the need for more realistic and diverse training data for visual table understanding models, though it's primarily an incremental dataset contribution rather than a methodological breakthrough.

The authors tackled the problem of limited visual diversity and fixed examples in visual table understanding datasets by introducing TABLET, a large-scale dataset with 4 million examples across 20 tasks grounded in 2 million unique tables where 88% preserve original visualizations. Fine-tuning vision-language models on TABLET improved performance on seen and unseen VTU tasks while increasing robustness on real-world table visualizations.

While table understanding increasingly relies on pixel-only settings where tables are processed as visual representations, current benchmarks predominantly use synthetic renderings that lack the complexity and visual diversity of real-world tables. Additionally, existing visual table understanding (VTU) datasets offer fixed examples with single visualizations and pre-defined instructions, providing no access to underlying serialized data for reformulation. We introduce TABLET, a large-scale VTU dataset with 4 million examples across 20 tasks, grounded in 2 million unique tables where 88% preserve original visualizations. Each example includes paired image-HTML representations, comprehensive metadata, and provenance information linking back to the source datasets. Fine-tuning vision-language models like Qwen2.5-VL-7B on TABLET improves performance on seen and unseen VTU tasks while increasing robustness on real-world table visualizations. By preserving original visualizations and maintaining example traceability in a unified large-scale collection, TABLET establishes a foundation for robust training and extensible evaluation of future VTU models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes