CLApr 30, 2021

Paraphrastic Representations at Scale

arXiv:2104.15114v2296 citations
Originality Incremental advance
AI Analysis

This provides a scalable and efficient solution for users needing high-quality sentence embeddings across languages, especially in resource-constrained environments.

The authors developed a system for training state-of-the-art paraphrastic sentence representations in multiple languages, achieving significantly improved performance on semantic similarity and bitext mining tasks, with models that are orders of magnitude faster and usable on CPU.

We present a system that allows users to train their own state-of-the-art paraphrastic sentence representations in a variety of languages. We also release trained models for English, Arabic, German, French, Spanish, Russian, Turkish, and Chinese. We train these models on large amounts of data, achieving significantly improved performance from the original papers proposing the methods on a suite of monolingual semantic similarity, cross-lingual semantic similarity, and bitext mining tasks. Moreover, the resulting models surpass all prior work on unsupervised semantic textual similarity, significantly outperforming even BERT-based models like Sentence-BERT (Reimers and Gurevych, 2019). Additionally, our models are orders of magnitude faster than prior work and can be used on CPU with little difference in inference speed (even improved speed over GPU when using more CPU cores), making these models an attractive choice for users without access to GPUs or for use on embedded devices. Finally, we add significantly increased functionality to the code bases for training paraphrastic sentence models, easing their use for both inference and for training them for any desired language with parallel data. We also include code to automatically download and preprocess training data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes