CLAIJul 27, 2025

Multi-Agent Interactive Question Generation Framework for Long Document Understanding

arXiv:2507.20145v14 citationsh-index: 2Has CodeMLSP
AI Analysis

This addresses the problem of costly human annotation for low-resource languages like Arabic in long-context scenarios, though it is incremental as it builds on existing multi-agent and data generation methods.

The paper tackles the challenge of generating fine-grained training data for long-context document understanding by proposing a fully automated multi-agent interactive framework, which efficiently produces high-quality questions for English and Arabic documents, resulting in a dataset (AraEngLongBench) that is challenging to major LVLMs.

Document Understanding (DU) in long-contextual scenarios with complex layouts remains a significant challenge in vision-language research. Although Large Vision-Language Models (LVLMs) excel at short-context DU tasks, their performance declines in long-context settings. A key limitation is the scarcity of fine-grained training data, particularly for low-resource languages such as Arabic. Existing state-of-the-art techniques rely heavily on human annotation, which is costly and inefficient. We propose a fully automated, multi-agent interactive framework to generate long-context questions efficiently. Our approach efficiently generates high-quality single- and multi-page questions for extensive English and Arabic documents, covering hundreds of pages across diverse domains. This facilitates the development of LVLMs with enhanced long-context understanding ability. Experimental results in this work have shown that our generated English and Arabic questions (\textbf{AraEngLongBench}) are quite challenging to major open- and close-source LVLMs. The code and data proposed in this work can be found in https://github.com/wangk0b/Multi_Agentic_QA_Long_Doc.git. Sample Question and Answer (QA) pairs and structured system prompts can be found in the Appendix.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes