Corpus and Models for Lemmatisation and POS-tagging of Old French
This work addresses a specific need for researchers in historical linguistics by providing tools for Old French, but it appears incremental as part of an ongoing project.
The paper tackles the problem of lemmatization and POS-tagging for Old French, an under-resourced historic language with high linguistic variation, by developing neural taggers and dedicated corpora, though no concrete performance numbers are provided.
Old French is a typical example of an under-resourced historic languages, that furtherly displays animportant amount of linguistic variation. In this paper, we present the current results of a long going project (2015-...) and describe how we broached the difficult question of providing lemmatisation andPOS models for Old French with the help of neural taggers and the progressive constitution of dedicated corpora.