CLAILGSep 18, 2025

TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding

arXiv:2509.14671v15 citationsh-index: 14Has Code
Originality Highly original
AI Analysis

This work solves the problem of efficient and accurate table understanding for AI applications, offering a training-efficient alternative to costly multimodal fine-tuning.

The paper tackles the challenge of table understanding by addressing the limitations of existing methods that either lose structural cues or struggle with semantics, proposing TableDART, a framework that dynamically selects optimal multimodal paths, which achieves state-of-the-art performance with an average improvement of 4.02% across seven benchmarks.

Modeling semantic and structural information from tabular data remains a core challenge for effective table understanding. Existing Table-as-Text approaches flatten tables for large language models (LLMs), but lose crucial structural cues, while Table-as-Image methods preserve structure yet struggle with fine-grained semantics. Recent Table-as-Multimodality strategies attempt to combine textual and visual views, but they (1) statically process both modalities for every query-table pair within a large multimodal LLMs (MLLMs), inevitably introducing redundancy and even conflicts, and (2) depend on costly fine-tuning of MLLMs. In light of this, we propose TableDART, a training-efficient framework that integrates multimodal views by reusing pretrained single-modality models. TableDART introduces a lightweight 2.59M-parameter MLP gating network that dynamically selects the optimal path (either Text-only, Image-only, or Fusion) for each table-query pair, effectively reducing redundancy and conflicts from both modalities. In addition, we propose a novel agent to mediate cross-modal knowledge integration by analyzing outputs from text- and image-based models, either selecting the best result or synthesizing a new answer through reasoning. This design avoids the prohibitive costs of full MLLM fine-tuning. Extensive experiments on seven benchmarks show that TableDART establishes new state-of-the-art performance among open-source models, surpassing the strongest baseline by an average of 4.02%. The code is available at: https://anonymous.4open.science/r/TableDART-C52B

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes