CVFeb 15, 2020

Historical Document Processing: Historical Document Processing: A Survey of Techniques, Tools, and Trends

arXiv:2002.06300v232 citations

AI Analysis

It addresses the need to transcribe scanned historical archives for cultural heritage institutions, but is incremental as a survey paper.

This paper surveys techniques, tools, and trends in Historical Document Processing, which digitizes historical documents for use by historians and scholars, summarizing major phases, algorithms, and datasets without presenting new experimental results.

Historical Document Processing is the process of digitizing written material from the past for future use by historians and other scholars. It incorporates algorithms and software tools from various subfields of computer science, including computer vision, document analysis and recognition, natural language processing, and machine learning, to convert images of ancient manuscripts, letters, diaries, and early printed texts automatically into a digital format usable in data mining and information retrieval systems. Within the past twenty years, as libraries, museums, and other cultural heritage institutions have scanned an increasing volume of their historical document archives, the need to transcribe the full text from these collections has become acute. Since Historical Document Processing encompasses multiple sub-domains of computer science, knowledge relevant to its purpose is scattered across numerous journals and conference proceedings. This paper surveys the major phases of, standard algorithms, tools, and datasets in the field of Historical Document Processing, discusses the results of a literature review, and finally suggests directions for further research.

View on arXiv PDF

Similar