A machine transliteration tool between Uzbek alphabets
This tool addresses the need for script conversion in Uzbek, a low-resource language, but is incremental as it applies existing methods to a new dataset.
The paper tackled the problem of machine transliteration for the low-resource Uzbek language by developing a tool that transforms words between three scripts (old Cyrillic, official Latin, and new Latin alphabets), resulting in an open-source Python package and web application with a public API.
Machine transliteration, as defined in this paper, is a process of automatically transforming written script of words from a source alphabet into words of another target alphabet within the same language, while preserving their meaning, as well as pronunciation. The main goal of this paper is to present a machine transliteration tool between three common scripts used in low-resource Uzbek language: the old Cyrillic, currently official Latin, and newly announced New Latin alphabets. The tool has been created using a combination of rule-based and fine-tuning approaches. The created tool is available as an open-source Python package, as well as a web-based application including a public API. To our knowledge, this is the first machine transliteration tool that supports the newly announced Latin alphabet of the Uzbek language.