CLAINov 4, 2025

A Preliminary Study of RAG for Taiwanese Historical Archives

arXiv:2511.07445v12 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

This is an incremental study applying an existing method to new data for researchers in historical archives and NLP.

The study tackled the application of Retrieval-Augmented Generation (RAG) to Taiwanese historical archives, finding that early-stage metadata integration improves retrieval and answer accuracy, but challenges like hallucinations and handling complex queries persist.

Retrieval-Augmented Generation (RAG) has emerged as a promising approach for knowledge-intensive tasks. However, few studies have examined RAG for Taiwanese Historical Archives. In this paper, we present an initial study of a RAG pipeline applied to two historical Traditional Chinese datasets, Fort Zeelandia and the Taiwan Provincial Council Gazette, along with their corresponding open-ended query sets. We systematically investigate the effects of query characteristics and metadata integration strategies on retrieval quality, answer generation, and the performance of the overall system. The results show that early-stage metadata integration enhances both retrieval and answer accuracy while also revealing persistent challenges for RAG systems, including hallucinations during generation and difficulties in handling temporal or multi-hop historical queries.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes