A Common Semantic Space for Monolingual and Cross-Lingual Meta-Embeddings
This work addresses the challenge of improving word representations, particularly for under-resourced languages, by enabling cross-lingual transfer learning, though it is incremental as it builds on existing embedding techniques.
The paper tackles the problem of creating monolingual and cross-lingual meta-embeddings by integrating multiple word embeddings from diverse sources and projecting them into a common semantic space, achieving state-of-the-art performance in word similarity and POS tagging for English and Spanish.
This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings. Our method integrates multiple word embeddings created from complementary techniques, textual sources, knowledge bases and languages. Existing word vectors are projected to a common semantic space using linear transformations and averaging. With our method the resulting meta-embeddings maintain the dimensionality of the original embeddings without losing information while dealing with the out-of-vocabulary problem. An extensive empirical evaluation demonstrates the effectiveness of our technique with respect to previous work on various intrinsic and extrinsic multilingual evaluations, obtaining competitive results for Semantic Textual Similarity and state-of-the-art performance for word similarity and POS tagging (English and Spanish). The resulting cross-lingual meta-embeddings also exhibit excellent cross-lingual transfer learning capabilities. In other words, we can leverage pre-trained source embeddings from a resource-rich language in order to improve the word representations for under-resourced languages.