CLMay 21, 2020

The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs

arXiv:2005.10790v110 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of processing Medieval Latin for digital humanities researchers, though it is incremental as it builds on existing lemmatization methods with added components like embeddings and graphs.

The authors tackled the problem of lemmatizing Medieval Latin texts by developing the Frankfurt Latin Lexicon (FLL), which integrates morphological expansion, word embeddings, and SemioGraphs, resulting in a resource tested on the Capitularies corpus for improved lemmatization and post-editing.

In this article we present the Frankfurt Latin Lexicon (FLL), a lexical resource for Medieval Latin that is used both for the lemmatization of Latin texts and for the post-editing of lemmatizations. We describe recent advances in the development of lemmatizers and test them against the Capitularies corpus (comprising Frankish royal edicts, mid-6th to mid-9th century), a corpus created as a reference for processing Medieval Latin. We also consider the post-correction of lemmatizations using a limited crowdsourcing process aimed at continuous review and updating of the FLL. Starting from the texts resulting from this lemmatization process, we describe the extension of the FLL by means of word embeddings, whose interactive traversing by means of SemioGraphs completes the digital enhanced hermeneutic circle. In this way, the article argues for a more comprehensive understanding of lemmatization, encompassing classical machine learning as well as intellectual post-corrections and, in particular, human computation in the form of interpretation processes based on graph representations of the underlying lexical resources.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes