Cross-lingual Models of Word Embeddings: An Empirical Comparison
This work addresses the problem of selecting effective cross-lingual embedding methods for NLP researchers and practitioners, but it is incremental as it focuses on empirical comparison rather than introducing new techniques.
The paper tackled the lack of systematic comparison in cross-lingual word embedding models by empirically evaluating four approaches on four language pairs across multiple tasks, finding that expensive cross-lingual knowledge generally yields better performance but cheaply supervised models can be competitive in specific cases.
Despite interest in using cross-lingual knowledge to learn word embeddings for various tasks, a systematic comparison of the possible approaches is lacking in the literature. We perform an extensive evaluation of four popular approaches of inducing cross-lingual embeddings, each requiring a different form of supervision, on four typographically different language pairs. Our evaluation setup spans four different tasks, including intrinsic evaluation on mono-lingual and cross-lingual similarity, and extrinsic evaluation on downstream semantic and syntactic applications. We show that models which require expensive cross-lingual knowledge almost always perform better, but cheaply supervised models often prove competitive on certain tasks.