Combining Static Word Embeddings and Contextual Representations for Bilingual Lexicon Induction
This work addresses BLI for cross-lingual NLP tasks, offering an incremental improvement by integrating two existing representation types.
The paper tackled the problem of Bilingual Lexicon Induction (BLI) by combining static word embeddings and contextual representations, resulting in average improvements of 3.2 points in supervised settings and 3.1 points in unsupervised settings across various language pairs.
Bilingual Lexicon Induction (BLI) aims to map words in one language to their translations in another, and is typically through learning linear projections to align monolingual word representation spaces. Two classes of word representations have been explored for BLI: static word embeddings and contextual representations, but there is no studies to combine both. In this paper, we propose a simple yet effective mechanism to combine the static word embeddings and the contextual representations to utilize the advantages of both paradigms. We test the combination mechanism on various language pairs under the supervised and unsupervised BLI benchmark settings. Experiments show that our mechanism consistently improves performances over robust BLI baselines on all language pairs by averagely improving 3.2 points in the supervised setting, and 3.1 points in the unsupervised setting.