CLAug 7, 2018

Design Challenges in Named Entity Transliteration

arXiv:1808.02563v132.01093 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of named entity transliteration for multilingual NLP applications, but it is incremental as it focuses on dataset curation and method comparison without introducing a new paradigm.

The paper tackles the problem of developing a multilingual named entity transliteration system by analyzing design challenges and evaluating traditional and neural methods, resulting in the release of bilingual dictionaries mined from Wikidata for English to Russian, Hebrew, Arabic, and Japanese Katakana.

We analyze some of the fundamental design challenges that impact the development of a multilingual state-of-the-art named entity transliteration system, including curating bilingual named entity datasets and evaluation of multiple transliteration methods. We empirically evaluate the transliteration task using traditional weighted finite state transducer (WFST) approach against two neural approaches: the encoder-decoder recurrent neural network method and the recent, non-sequential Transformer method. In order to improve availability of bilingual named entity transliteration datasets, we release personal name bilingual dictionaries minded from Wikidata for English to Russian, Hebrew, Arabic and Japanese Katakana. Our code and dictionaries are publicly available.

View on arXiv PDF Code

Similar