HC CV IROct 23, 2023

DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading

Hao Wang, Qingxuan Wang, Yue Li, Changqing Wang, Chenhui Chu, Rui Wang

arXiv:2310.14802v137.7132 citationsh-index: 8Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the problem of dataset scarcity for researchers in Document AI, though it is incremental as it provides a new dataset rather than a novel method.

The authors tackled the lack of datasets for Document AI models by introducing DocTrack, a visually-rich document dataset aligned with human eye-movement data, and found that current models still fall short of human-like reading accuracy and flexibility.

The use of visually-rich documents (VRDs) in various fields has created a demand for Document AI models that can read and comprehend documents like humans, which requires the overcoming of technical, linguistic, and cognitive barriers. Unfortunately, the lack of appropriate datasets has significantly hindered advancements in the field. To address this issue, we introduce \textsc{DocTrack}, a VRD dataset really aligned with human eye-movement information using eye-tracking technology. This dataset can be used to investigate the challenges mentioned above. Additionally, we explore the impact of human reading order on document understanding tasks and examine what would happen if a machine reads in the same order as a human. Our results suggest that although Document AI models have made significant progress, they still have a long way to go before they can read VRDs as accurately, continuously, and flexibly as humans do. These findings have potential implications for future research and development of Document AI models. The data is available at \url{https://github.com/hint-lab/doctrack}.

View on arXiv PDF Code

Similar