CLLGOct 29, 2019

Sentence Embeddings for Russian NLU

arXiv:1910.13291v1
Originality Synthesis-oriented
AI Analysis

This work addresses the lack of benchmarks for Russian language understanding, providing datasets and performance insights for researchers in NLP.

The study evaluated sentence embedding models on Russian NLU tasks, finding that BERT embeddings outperformed FastText and ELMo, with BERT achieving up to 85% accuracy on paraphrase identification.

We investigate the performance of sentence embeddings models on several tasks for the Russian language. In our comparison, we include such tasks as multiple choice question answering, next sentence prediction, and paraphrase identification. We employ FastText embeddings as a baseline and compare it to ELMo and BERT embeddings. We conduct two series of experiments, using both unsupervised (i.e., based on similarity measure only) and supervised approaches for the tasks. Finally, we present datasets for multiple choice question answering and next sentence prediction in Russian.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes