CLAug 23, 2022

MATra: A Multilingual Attentive Transliteration System for Indian Scripts

arXiv:2208.10801v16 citationsh-index: 1
Originality Incremental advance
AI Analysis

It addresses a gap in NLP for Indic language transliteration, enabling better communication in written tasks where language barriers exist, though it is incremental as it builds on existing transformer methods.

The paper tackles the problem of transliteration between Indian languages and English, where existing models are limited, by introducing a multilingual transformer-based system that achieves a top-1 accuracy of 80.7%, which is 29.5% higher than state-of-the-art results, and a phonetic accuracy of 93.5%.

Transliteration is a task in the domain of NLP where the output word is a similar-sounding word written using the letters of any foreign language. Today this system has been developed for several language pairs that involve English as either the source or target word and deployed in several places like Google Translate and chatbots. However, there is very little research done in the field of Indic languages transliterated to other Indic languages. This paper demonstrates a multilingual model based on transformers (with some modifications) that can give noticeably higher performance and accuracy than all existing models in this domain and get much better results than state-of-the-art models. This paper shows a model that can perform transliteration between any pair among the following five languages - English, Hindi, Bengali, Kannada and Tamil. It is applicable in scenarios where language is a barrier to communication in any written task. The model beats the state-of-the-art (for all pairs among the five mentioned languages - English, Hindi, Bengali, Kannada, and Tamil) and achieves a top-1 accuracy score of 80.7%, about 29.5% higher than the best current results. Furthermore, the model achieves 93.5% in terms of Phonetic Accuracy (transliteration is primarily a phonetic/sound-based task).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes