CVCLSep 9, 2025

Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images

MILA
arXiv:2509.07966v14 citationsh-index: 3Has Code
Originality Incremental advance
AI Analysis

This addresses a critical gap for vision-language models in handling complex tabular data, though it is incremental as it builds on existing benchmark efforts.

The authors tackled the lack of large-scale, diverse benchmarks for visual reasoning over table images by introducing Visual-TableQA, a dataset with 2.5k tables and 6k QA pairs generated at under USD 100, which improved model generalization to outperform proprietary models on external benchmarks.

Visual reasoning over structured data such as tables is a critical capability for modern vision-language models (VLMs), yet current benchmarks remain limited in scale, diversity, or reasoning depth, especially when it comes to rendered table images. Addressing this gap, we introduce Visual-TableQA, a large-scale, open-domain multimodal dataset specifically designed to evaluate and enhance visual reasoning over complex tabular data. Our generation pipeline is modular, scalable, and fully autonomous, involving multiple reasoning LLMs collaborating across distinct roles: generation, validation, and inspiration. Visual-TableQA comprises 2.5k richly structured LaTeX-rendered tables and 6k reasoning-intensive QA pairs, all produced at a cost of under USD 100. To promote diversity and creativity, our pipeline performs multi-model collaborative data generation via cross-model prompting ('inspiration') and LLM-jury filtering. Stronger models seed layouts and topics that weaker models elaborate, collectively distilling diverse reasoning patterns and visual structures into the dataset. Empirical results show that models fine-tuned on Visual-TableQA generalize robustly to external benchmarks, outperforming several proprietary models despite the dataset's synthetic nature. The full pipeline and resources are publicly available at https://github.com/AI-4-Everyone/Visual-TableQA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes