CLAug 22, 2025

M3TQA: Massively Multilingual Multitask Table Question Answering

Daixin Shu, Jian Yang, Zhenhe Wu, Xianjie Wu, Xianfu Cheng, Xiangyuan Guan, Yanghai Wang, Pengfei Wu, Tingyang Yang, Hualei Zhu, Wei Zhang, Ge Zhang

arXiv:2508.16265v12 citationsh-index: 18

Originality Incremental advance

AI Analysis

This addresses the geolinguistic imbalance in multilingual table benchmarks for researchers in natural language processing and AI, though it is incremental as it builds on existing translation and annotation methods.

The paper tackles the problem of limited multilingual table understanding research by introducing M3TQA, a massively multilingual multitask table question answering framework spanning 97 languages, including underrepresented ones, with experiments showing that synthetically generated QA data can significantly boost performance for low-resource languages.

Tabular data is a fundamental component of real-world information systems, yet most research in table understanding remains confined to English, leaving multilingual comprehension significantly underexplored. Existing multilingual table benchmarks suffer from geolinguistic imbalance - overrepresenting certain languages and lacking sufficient scale for rigorous cross-lingual analysis. To address these limitations, we introduce a comprehensive framework for massively multilingual multitask table question answering, featuring m3TQA-Instruct, a large-scale benchmark spanning 97 languages across diverse language families, including underrepresented and low-resource languages. We construct m3TQA by curating 50 real-world tables in Chinese and English, then applying a robust six-step LLM-based translation pipeline powered by DeepSeek and GPT-4o, achieving high translation fidelity with a median BLEU score of 60.19 as validated through back-translation. The benchmark includes 2,916 professionally annotated question-answering pairs across four tasks designed to evaluate nuanced table reasoning capabilities. Experiments on state-of-the-art LLMs reveal critical insights into cross-lingual generalization, demonstrating that synthetically generated, unannotated QA data can significantly boost performance, particularly for low-resource languages. M3T-Bench establishes a new standard for multilingual table understanding, providing both a challenging evaluation platform and a scalable methodology for future research.

View on arXiv PDF

Similar