Multilingual Factor Analysis
This work addresses the challenge of aligning multilingual embeddings for applications in natural language processing, but it is incremental as it builds on existing latent variable models.
The authors tackled the problem of learning multilingual word representations by fitting a generative latent variable model to a multilingual dictionary, modeling words across languages as views of a common latent lexical meaning, and achieved competitive results on various tasks.
In this work we approach the task of learning multilingual word representations in an offline manner by fitting a generative latent variable model to a multilingual dictionary. We model equivalent words in different languages as different views of the same word generated by a common latent variable representing their latent lexical meaning. We explore the task of alignment by querying the fitted model for multilingual embeddings achieving competitive results across a variety of tasks. The proposed model is robust to noise in the embedding space making it a suitable method for distributed representations learned from noisy corpora.