SEAIMay 22, 2025

LLM Agents for Interactive Exploration of Historical Cadastre Data: Framework and Application to Venice

arXiv:2505.17148v2h-index: 6Computational Humanities Research
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of analyzing complex historical urban data for historians and urban researchers, though it is incremental as it applies existing LLM methods to a new domain-specific dataset.

The researchers tackled the challenge of analyzing non-standardized historical cadastre data by developing a text-to-programs framework using Large Language Models, which enabled spatial queries and reconstruction of population and property information in Venice from 1740 to 1808 with improved interpretability and reduced hallucination.

Cadastral data reveal key information about the historical organization of cities but are often non-standardized due to diverse formats and human annotations, complicating large-scale analysis. We explore as a case study Venice's urban history during the critical period from 1740 to 1808, capturing the transition following the fall of the ancient Republic and the Ancien Régime. This era's complex cadastral data, marked by its volume and lack of uniform structure, presents unique challenges that our approach adeptly navigates, enabling us to generate spatial queries that bridge past and present urban landscapes. We present a text-to-programs framework that leverages Large Language Models (\llms) to process natural language queries as executable code for analyzing historical cadastral records. Our methodology implements two complementary techniques: a SQL agent for handling structured queries about specific cadastral information, and a coding agent for complex analytical operations requiring custom data manipulation. We propose a taxonomy that classifies historical research questions based on their complexity and analytical requirements, mapping them to the most appropriate technical approach. This framework is supported by an investigation into the execution consistency of the system, alongside a qualitative analysis of the answers it produces. By ensuring interpretability and minimizing hallucination through verifiable program outputs, we demonstrate the system's effectiveness in reconstructing past population information, property features, and spatiotemporal comparisons in Venice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes