Towards the AI Historian: Agentic Information Extraction from Primary Sources

Lorenz Hufe, Niclas Griesshaber, Gavin Greif, Sebastian Oliver Eck, Philip Torr

arXiv:2604.0355378.5h-index: 12Has Code

AI Analysis

It addresses the problem of automating information extraction from primary sources for historians, but it is incremental as it builds on existing vision-language models.

The paper tackles the limited AI adoption in historical research by introducing Chronos, an AI Historian module that converts image scans of primary sources into data through natural-language interactions, enabling historians to adapt workflows and evaluate AI performance.

AI is supporting, accelerating, and automating scientific discovery across a diverse set of fields. However, AI adoption in historical research remains limited due to the lack of solutions designed for historians. In this technical progress report, we introduce the first module of Chronos, an AI Historian under development. This module enables historians to convert image scans of primary sources into data through natural-language interactions. Rather than imposing a fixed extraction pipeline powered by a vision-language model (VLM), it allows historians to adapt workflows for heterogeneous source corpora, evaluate the performance of AI models on specific tasks, and iteratively refine workflows through natural-language interaction with the Chronos agent. The module is open-source and ready to be used by historical researchers on their own sources.

View on arXiv PDF

Similar