IRAICVMay 4, 2025

Explainable Coarse-to-Fine Ancient Manuscript Duplicates Discovery

arXiv:2505.03836v21 citationsh-index: 8Has Code
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and accurate duplicate detection in ancient manuscripts, aiding archaeological curation and historical studies, though it is incremental as it builds on existing image retrieval and matching methods.

The paper tackles the problem of identifying duplicates in ancient manuscripts, specifically Oracle Bones, by proposing a coarse-to-fine framework that combines keypoints and text-based matching, achieving comparable recall and top-ranked retrieval scores while discovering over 60 new duplicate pairs missed by experts.

Ancient manuscripts are the primary source of ancient linguistic corpora. However, many ancient manuscripts exhibit duplications due to unintentional repeated publication or deliberate forgery. The Dead Sea Scrolls, for example, include counterfeit fragments, whereas Oracle Bones (OB) contain both republished materials and fabricated specimens. Identifying ancient manuscript duplicates is of great significance for both archaeological curation and ancient history study. In this work, we design a progressive OB duplicate discovery framework that combines unsupervised low-level keypoints matching with high-level text-centric content-based matching to refine and rank the candidate OB duplicates with semantic awareness and interpretability. We compare our model with state-of-the-art content-based image retrieval and image matching methods, showing that our model yields comparable recall performance and the highest simplified mean reciprocal rank scores for both Top-5 and Top-15 retrieval results, and with significantly accelerated computation efficiency. We have discovered over 60 pairs of new OB duplicates in real-world deployment, which were missed by domain experts for decades. Code, model and real-world results are available at: https://github.com/cszhangLMU/OBD-Finder/.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes