CVCLApr 13, 2023

PDFVQA: A New Dataset for Real-World VQA on PDF Documents

arXiv:2304.06447v533 citationsh-index: 21
Originality Incremental advance
AI Analysis

This work addresses the need for better document understanding in AI, though it is incremental as it extends existing VQA datasets to multi-page documents.

The authors tackled the problem of document understanding by introducing PDFVQA, a new dataset for visual question answering on multi-page PDF documents, and proposed a graph-based model that improved performance over baselines on various question types.

Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions. We proposed a new document-based VQA dataset, PDF-VQA, to comprehensively examine the document understanding from various aspects, including document element recognition, document layout structural understanding as well as contextual understanding and key information extraction. Our PDF-VQA dataset extends the current scale of document understanding that limits on the single document page to the new scale that asks questions over the full document of multiple pages. We also propose a new graph-based VQA model that explicitly integrates the spatial and hierarchically structural relationships between different document elements to boost the document structural understanding. The performances are compared with several baselines over different question types and tasks\footnote{The full dataset will be released after paper acceptance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes