CLMar 18, 2022

CaMEL: Case Marker Extraction without Labels

Leonie Weissweiler, Valentin Hofmann, Masoud Jalili Sabet, Hinrich Schütze

Oxford

arXiv:2203.10010v2639 citationsh-index: 70

Originality Incremental advance

AI Analysis

This work addresses a challenge in computational morphology for low-resource languages, but it is incremental as it builds on existing tools like noun phrase chunkers and alignment systems.

The authors tackled the problem of extracting case markers in low-resource languages without labeled data, achieving results by automatically constructing a silver standard from UniMorph and applying their model to 83 languages.

We introduce CaMEL (Case Marker Extraction without Labels), a novel and challenging task in computational morphology that is especially relevant for low-resource languages. We propose a first model for CaMEL that uses a massively multilingual corpus to extract case markers in 83 languages based only on a noun phrase chunker and an alignment system. To evaluate CaMEL, we automatically construct a silver standard from UniMorph. The case markers extracted by our model can be used to detect and visualise similarities and differences between the case systems of different languages as well as to annotate fine-grained deep cases in languages in which they are not overtly marked.

View on arXiv PDF

Similar