Annotating Cognates and Etymological Origin in Turkic Languages
This work addresses a domain-specific problem for researchers in computational linguistics and Turkic language studies, with an incremental approach to annotation methodology.
The paper tackles the challenge of annotating cognates and etymological origins in Turkic languages, presenting a methodology that balances annotation effort with utility for improving automated translation lexicon induction.
Turkic languages exhibit extensive and diverse etymological relationships among lexical items. These relationships make the Turkic languages promising for exploring automated translation lexicon induction by leveraging cognate and other etymological information. However, due to the extent and diversity of the types of relationships between words, it is not clear how to annotate such information. In this paper, we present a methodology for annotating cognates and etymological origin in Turkic languages. Our method strives to balance the amount of research effort the annotator expends with the utility of the annotations for supporting research on improving automated translation lexicon induction.