Unsupervised Bilingual Lexicon Induction Across Writing Systems
This work addresses the challenge of cross-lingual word alignment for languages with different scripts, but it is incremental as it builds on existing methods.
The paper tackled the problem of unsupervised bilingual lexicon induction across languages with different writing systems by augmenting a state-of-the-art method with orthographic features, resulting in improved performance on three language pairs with varying lexical similarity.
Recent embedding-based methods in unsupervised bilingual lexicon induction have shown good results, but generally have not leveraged orthographic (spelling) information, which can be helpful for pairs of related languages. This work augments a state-of-the-art method with orthographic features, and extends prior work in this space by proposing methods that can learn and utilize orthographic correspondences even between languages with different scripts. We demonstrate this by experimenting on three language pairs with different scripts and varying degrees of lexical similarity.