CLAug 26, 2025

Retrieval-Augmented Generation for Natural Language Art Provenance Searches in the Getty Provenance Index

arXiv:2508.19093v11 citationsAcademia AI and Applications
Originality Synthesis-oriented
AI Analysis

This provides a scalable solution for art historians and cultural heritage professionals to conduct provenance research more efficiently, though it is incremental as it applies an existing RAG method to a new domain.

The researchers tackled the problem of fragmented, multilingual archival data hindering efficient art provenance searches by developing a Retrieval-Augmented Generation (RAG) framework for the Getty Provenance Index, which enables natural-language and multilingual searches and was tested on a 10,000-record sample from the German Sales dataset.

This research presents a Retrieval-Augmented Generation (RAG) framework for art provenance studies, focusing on the Getty Provenance Index. Provenance research establishes the ownership history of artworks, which is essential for verifying authenticity, supporting restitution and legal claims, and understanding the cultural and historical context of art objects. The process is complicated by fragmented, multilingual archival data that hinders efficient retrieval. Current search portals require precise metadata, limiting exploratory searches. Our method enables natural-language and multilingual searches through semantic retrieval and contextual summarization, reducing dependence on metadata structures. We assess RAG's capability to retrieve and summarize auction records using a 10,000-record sample from the Getty Provenance Index - German Sales. The results show this approach provides a scalable solution for navigating art market archives, offering a practical tool for historians and cultural heritage professionals conducting historically sensitive research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes