Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization
This work addresses the challenge of enhancing semantic similarity and multilinguality in sentence embeddings for tasks like cross-lingual intent classification, representing an incremental improvement over existing methods.
The paper tackles the problem of improving multilingual sentence embeddings by introducing Emu, a system that fine-tunes pre-trained embeddings with a semantic classifier and language discriminator, resulting in outperforming the state-of-the-art model on cross-lingual intent classification using monolingual labeled data.
We present Emu, a system that semantically enhances multilingual sentence embeddings. Our framework fine-tunes pre-trained multilingual sentence embeddings using two main components: a semantic classifier and a language discriminator. The semantic classifier improves the semantic similarity of related sentences, whereas the language discriminator enhances the multilinguality of the embeddings via multilingual adversarial training. Our experimental results based on several language pairs show that our specialized embeddings outperform the state-of-the-art multilingual sentence embedding model on the task of cross-lingual intent classification using only monolingual labeled data.