CLJul 9, 2019

Multilingual Universal Sentence Encoder for Semantic Retrieval

arXiv:1907.04307v11108 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient multilingual semantic retrieval tools, offering a practical solution for cross-lingual applications, though it is incremental as it builds on existing dual-encoder and translation-based methods.

The authors tackled the problem of multilingual semantic retrieval by introducing two pre-trained sentence encoding models based on Transformer and CNN architectures, which embed text from 16 languages into a single semantic space and achieve competitive performance with state-of-the-art on tasks like semantic retrieval, translation pair bitext retrieval, and retrieval question answering.

We introduce two pre-trained retrieval focused multilingual sentence encoding models, respectively based on the Transformer and CNN model architectures. The models embed text from 16 languages into a single semantic space using a multi-task trained dual-encoder that learns tied representations using translation based bridge tasks (Chidambaram al., 2018). The models provide performance that is competitive with the state-of-the-art on: semantic retrieval (SR), translation pair bitext retrieval (BR) and retrieval question answering (ReQA). On English transfer learning tasks, our sentence-level embeddings approach, and in some cases exceed, the performance of monolingual, English only, sentence embedding models. Our models are made available for download on TensorFlow Hub.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes