CLMar 15, 2018

RUSSE: The First Workshop on Russian Semantic Similarity

arXiv:1803.05820v143 citations
Originality Synthesis-oriented
AI Analysis

This work provides the first evaluation methodology for semantic similarity in Russian, addressing a gap for researchers and practitioners in natural language processing for Slavic languages, though it is incremental as it applies existing methods to a new language.

The paper addresses the lack of semantic similarity analysis for the Russian language by proposing a shared task and creating four novel benchmark datasets, with results from 105 submissions showing that successful English methods like distributional and skip-gram models are applicable to Russian, with top systems using supervised models and unsupervised approaches ranking among the top 5.

The paper gives an overview of the Russian Semantic Similarity Evaluation (RUSSE) shared task held in conjunction with the Dialogue 2015 conference. There exist a lot of comparative studies on semantic similarity, yet no analysis of such measures was ever performed for the Russian language. Exploring this problem for the Russian language is even more interesting, because this language has features, such as rich morphology and free word order, which make it significantly different from English, German, and other well-studied languages. We attempt to bridge this gap by proposing a shared task on the semantic similarity of Russian nouns. Our key contribution is an evaluation methodology based on four novel benchmark datasets for the Russian language. Our analysis of the 105 submissions from 19 teams reveals that successful approaches for English, such as distributional and skip-gram models, are directly applicable to Russian as well. On the one hand, the best results in the contest were obtained by sophisticated supervised models that combine evidence from different sources. On the other hand, completely unsupervised approaches, such as a skip-gram model estimated on a large-scale corpus, were able score among the top 5 systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes