Learning Bilingual Word Embeddings Using Lexical Definitions
This addresses the need for efficient cross-lingual NLP tools by reducing reliance on costly resources, though it is incremental as it builds on existing embedding methods.
The paper tackles the problem of training bilingual word embeddings without expensive seed lexicons or noisy parallel sentences by proposing BilLex, which leverages lexical definitions for automatic fine-grained word alignment, resulting in significant outperformance over previous methods in word-level and sentence-level translation tasks.
Bilingual word embeddings, which representlexicons of different languages in a shared em-bedding space, are essential for supporting se-mantic and knowledge transfers in a variety ofcross-lingual NLP tasks. Existing approachesto training bilingual word embeddings requireoften require pre-defined seed lexicons that areexpensive to obtain, or parallel sentences thatcomprise coarse and noisy alignment. In con-trast, we propose BilLex that leverages pub-licly available lexical definitions for bilingualword embedding learning. Without the needof predefined seed lexicons, BilLex comprisesa novel word pairing strategy to automati-cally identify and propagate the precise fine-grained word alignment from lexical defini-tions. We evaluate BilLex in word-level andsentence-level translation tasks, which seek tofind the cross-lingual counterparts of wordsand sentences respectively.BilLex signifi-cantly outperforms previous embedding meth-ods on both tasks.