CLSep 14, 2016

Transliteration in Any Language with Surrogate Languages

arXiv:1609.04325v12 citations
Originality Incremental advance
AI Analysis

This addresses the limitation of previous transliteration methods that were restricted to Wikipedia languages, enabling broader multilingual applications.

The paper tackles the problem of generating transliterations for any language by using Wikipedia data as surrogate training, ranking languages by suitability for each target. The approach achieves performance comparable to an oracle ceiling and sometimes exceeds it.

We introduce a method for transliteration generation that can produce transliterations in every language. Where previous results are only as multilingual as Wikipedia, we show how to use training data from Wikipedia as surrogate training for any language. Thus, the problem becomes one of ranking Wikipedia languages in order of suitability with respect to a target language. We introduce several task-specific methods for ranking languages, and show that our approach is comparable to the oracle ceiling, and even outperforms it in some cases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes