Syntactic Transfer to Kyrgyz Using the Treebank Translation Method
This work addresses the problem of resource scarcity for Kyrgyz language processing, offering a method to simplify corpus development, though it is incremental as it builds on existing translation techniques.
The study tackled the challenge of creating high-quality syntactic corpora for the low-resource Kyrgyz language by proposing a tool that transfers syntactic annotations from Turkish to Kyrgyz using a treebank translation method, achieving higher syntactic annotation accuracy compared to a monolingual model trained on the Kyrgyz KTMU treebank.
The Kyrgyz language, as a low-resource language, requires significant effort to create high-quality syntactic corpora. This study proposes an approach to simplify the development process of a syntactic corpus for Kyrgyz. We present a tool for transferring syntactic annotations from Turkish to Kyrgyz based on a treebank translation method. The effectiveness of the proposed tool was evaluated using the TueCL treebank. The results demonstrate that this approach achieves higher syntactic annotation accuracy compared to a monolingual model trained on the Kyrgyz KTMU treebank. Additionally, the study introduces a method for assessing the complexity of manual annotation for the resulting syntactic trees, contributing to further optimization of the annotation process.