CLDec 17, 2024

Syntactic Transfer to Kyrgyz Using the Treebank Translation Method

arXiv:2412.13146v1h-index: 8Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of resource scarcity for Kyrgyz language processing, offering a method to simplify corpus development, though it is incremental as it builds on existing translation techniques.

The study tackled the challenge of creating high-quality syntactic corpora for the low-resource Kyrgyz language by proposing a tool that transfers syntactic annotations from Turkish to Kyrgyz using a treebank translation method, achieving higher syntactic annotation accuracy compared to a monolingual model trained on the Kyrgyz KTMU treebank.

The Kyrgyz language, as a low-resource language, requires significant effort to create high-quality syntactic corpora. This study proposes an approach to simplify the development process of a syntactic corpus for Kyrgyz. We present a tool for transferring syntactic annotations from Turkish to Kyrgyz based on a treebank translation method. The effectiveness of the proposed tool was evaluated using the TueCL treebank. The results demonstrate that this approach achieves higher syntactic annotation accuracy compared to a monolingual model trained on the Kyrgyz KTMU treebank. Additionally, the study introduces a method for assessing the complexity of manual annotation for the resulting syntactic trees, contributing to further optimization of the annotation process.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes