CLMay 19, 2022

A machine transliteration tool between Uzbek alphabets

Ulugbek Salaev, Elmurod Kuriyozov, Carlos Gómez-Rodríguez

arXiv:2205.09578v12.329 citationsh-index: 30Has Code

Originality Synthesis-oriented

AI Analysis

This tool addresses the need for script conversion in Uzbek, a low-resource language, but is incremental as it applies existing methods to a new dataset.

The paper tackled the problem of machine transliteration for the low-resource Uzbek language by developing a tool that transforms words between three scripts (old Cyrillic, official Latin, and new Latin alphabets), resulting in an open-source Python package and web application with a public API.

Machine transliteration, as defined in this paper, is a process of automatically transforming written script of words from a source alphabet into words of another target alphabet within the same language, while preserving their meaning, as well as pronunciation. The main goal of this paper is to present a machine transliteration tool between three common scripts used in low-resource Uzbek language: the old Cyrillic, currently official Latin, and newly announced New Latin alphabets. The tool has been created using a combination of rule-based and fine-tuning approaches. The created tool is available as an open-source Python package, as well as a web-based application including a public API. To our knowledge, this is the first machine transliteration tool that supports the newly announced Latin alphabet of the Uzbek language.

View on arXiv PDF Code

Similar