Orthographic Syllable as basic unit for SMT between Related Languages
This addresses translation efficiency for low-resource related languages, though it is an incremental improvement over existing unit-based methods.
The paper tackled machine translation between related languages with abugida/alphabetic scripts by using orthographic syllables as the basic unit, showing this approach significantly outperformed word, morpheme, and character-based models when trained on small parallel corpora.
We explore the use of the orthographic syllable, a variable-length consonant-vowel sequence, as a basic unit of translation between related languages which use abugida or alphabetic scripts. We show that orthographic syllable level translation significantly outperforms models trained over other basic units (word, morpheme and character) when training over small parallel corpora.