CLJul 24, 2022

Enhancements to the BOUN Treebank Reflecting the Agglutinative Nature of Turkish

arXiv:2207.11782v112 citationsh-index: 30
Originality Synthesis-oriented
AI Analysis

This work addresses linguistic representation problems for Turkish NLP, but it is incremental as it builds on existing treebank and framework.

The study tackled issues in representing Turkish's agglutinative features in the BOUN Treebank by introducing new annotation conventions within the Universal Dependencies framework, resulting in enhanced representational capabilities tested on an LSTM-based dependency parser and an updated BoAT Tool.

In this study, we aim to offer linguistically motivated solutions to resolve the issues of the lack of representation of null morphemes, highly productive derivational processes, and syncretic morphemes of Turkish in the BOUN Treebank without diverging from the Universal Dependencies framework. In order to tackle these issues, new annotation conventions were introduced by splitting certain lemmas and employing the MISC (miscellaneous) tab in the UD framework to denote derivation. Representational capabilities of the re-annotated treebank were tested on a LSTM-based dependency parser and an updated version of the BoAT Tool is introduced.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes