AIJan 27

CoReTab: Improving Multimodal Table Understanding with Code-driven Reasoning

arXiv:2601.19193v14.42 citationsh-index: 1Has Code

Originality Incremental advance

AI Analysis

This addresses the need for better multi-step reasoning and interpretability in multimodal table understanding for AI applications, representing a strong specific gain rather than a broad paradigm shift.

The paper tackled the problem of insufficient accuracy and interpretability in multimodal table understanding by introducing CoReTab, a code-driven reasoning framework that produced a dataset of 115K verified samples and achieved gains of +6.2%, +5.7%, and +25.6% over baselines on benchmarks.

Existing datasets for multimodal table understanding, such as MMTab, primarily provide short factual answers without explicit multi-step reasoning supervision. Models trained on these datasets often generate brief responses that offers insufficient accuracy and limited interpretability into how these models arrive at the final answer. We introduce CoReTab, a code-driven reasoning framework that produces scalable, interpretable, and automatically verifiable annotations by coupling multi-step reasoning with executable Python code. Using the CoReTab framework, we curate a dataset of 115K verified samples averaging 529 tokens per response and fine-tune open-source MLLMs through a three-stage pipeline. We evaluate the resulting model trained on CoReTab across 17 MMTab benchmarks spanning table question answering, fact verification, and table structure understanding. Our model achieves significant gains of +6.2%, +5.7%, and +25.6%, respectively, over MMTab-trained baselines, while producing transparent and verifiable reasoning traces. These results establish CoReTab as a robust and generalizable supervision framework for improving multi-step reasoning in multimodal table understanding.

View on arXiv PDF

Similar