CLMay 19, 2022

A machine transliteration tool between Uzbek alphabets

arXiv:2205.09578v129 citationsh-index: 30Has Code
Originality Synthesis-oriented
AI Analysis

This tool addresses the need for script conversion in Uzbek, a low-resource language, but is incremental as it applies existing methods to a new dataset.

The paper tackled the problem of machine transliteration for the low-resource Uzbek language by developing a tool that transforms words between three scripts (old Cyrillic, official Latin, and new Latin alphabets), resulting in an open-source Python package and web application with a public API.

Machine transliteration, as defined in this paper, is a process of automatically transforming written script of words from a source alphabet into words of another target alphabet within the same language, while preserving their meaning, as well as pronunciation. The main goal of this paper is to present a machine transliteration tool between three common scripts used in low-resource Uzbek language: the old Cyrillic, currently official Latin, and newly announced New Latin alphabets. The tool has been created using a combination of rule-based and fine-tuning approaches. The created tool is available as an open-source Python package, as well as a web-based application including a public API. To our knowledge, this is the first machine transliteration tool that supports the newly announced Latin alphabet of the Uzbek language.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes