CLAIJun 12, 2024

Multimodal Table Understanding

arXiv:2406.08100v156 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses a practical challenge for real-world applications where high-quality textual table data is inaccessible, though it is incremental as it builds on existing multimodal large language models.

The paper tackles the problem of table understanding by proposing a multimodal approach that directly uses table images instead of requiring textual representations, and introduces Table-LLaVA, which significantly outperforms open-source baselines on 23 benchmarks.

Although great progress has been made by previous table understanding methods including recent approaches based on large language models (LLMs), they rely heavily on the premise that given tables must be converted into a certain text sequence (such as Markdown or HTML) to serve as model input. However, it is difficult to access such high-quality textual table representations in some real-world scenarios, and table images are much more accessible. Therefore, how to directly understand tables using intuitive visual information is a crucial and urgent challenge for developing more practical applications. In this paper, we propose a new problem, multimodal table understanding, where the model needs to generate correct responses to various table-related requests based on the given table image. To facilitate both the model training and evaluation, we construct a large-scale dataset named MMTab, which covers a wide spectrum of table images, instructions and tasks. On this basis, we develop Table-LLaVA, a generalist tabular multimodal large language model (MLLM), which significantly outperforms recent open-source MLLM baselines on 23 benchmarks under held-in and held-out settings. The code and data is available at this https://github.com/SpursGoZmy/Table-LLaVA

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes