CLJan 8

DocDancer: Towards Agentic Document-Grounded Information Seeking

arXiv:2601.05163v15 citationsh-index: 9Has Code
Originality Incremental advance
AI Analysis

This addresses the need for better tool utilization in document-grounded information seeking, though it appears incremental as it builds on existing DocQA methods.

The authors tackled the problem of document question answering by introducing DocDancer, an open-source agent framework that models document exploration and comprehension, achieving effectiveness on benchmarks like MMLongBench-Doc and DocBench.

Document Question Answering (DocQA) focuses on answering questions grounded in given documents, yet existing DocQA agents lack effective tool utilization and largely rely on closed-source models. In this work, we introduce DocDancer, an end-to-end trained open-source Doc agent. We formulate DocQA as an information-seeking problem and propose a tool-driven agent framework that explicitly models document exploration and comprehension. To enable end-to-end training of such agents, we introduce an Exploration-then-Synthesis data synthesis pipeline that addresses the scarcity of high-quality training data for DocQA. Training on the synthesized data, the trained models on two long-context document understanding benchmarks, MMLongBench-Doc and DocBench, show their effectiveness. Further analysis provides valuable insights for the agentic tool design and synthetic data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes