CLDLNov 3, 2020

Exhaustive Entity Recognition for Coptic: Challenges and Solutions

arXiv:2011.02068v1991 citations
AI Analysis

This work addresses the problem of semantic access to ancient texts for digital humanities researchers, though it is incremental as it adapts existing NLP methods to a specific domain.

The paper tackled entity recognition for Coptic, a low-resource and morphologically complex language, by developing solutions using dependency parsing, CRF models, and hand-crafted resources, achieving high accuracy with significantly less data than high-resource languages.

Entity recognition provides semantic access to ancient materials in the Digital Humanities: itexposes people and places of interest in texts that cannot be read exhaustively, facilitates linkingresources and can provide a window into text contents, even for texts with no translations. Inthis paper we present entity recognition for Coptic, the language of Hellenistic era Egypt. Weevaluate NLP approaches to the task and lay out difficulties in applying them to a low-resource,morphologically complex language. We present solutions for named and non-named nested en-tity recognition and semi-automatic entity linking to Wikipedia, relying on robust dependencyparsing, feature-based CRF models, and hand-crafted knowledge base resources, enabling highaccuracy NER with orders of magnitude less data than those used for high resource languages.The results suggest avenues for research on other languages in similar settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes