CLOct 13, 2020

RuSemShift: a dataset of historical lexical semantic change in Russian

arXiv:2010.06436v1994 citations
Originality Synthesis-oriented
AI Analysis

This provides a domain-specific resource for researchers in computational linguistics studying semantic change in Russian, though it is incremental as it builds on existing frameworks.

The authors tackled the problem of modeling historical lexical semantic change in Russian by creating RuSemShift, a large-scale manually annotated dataset covering pre-Soviet to post-Soviet times, and reported promising but improvable results from distributional approaches.

We present RuSemShift, a large-scale manually annotated test set for the task of semantic change modeling in Russian for two long-term time period pairs: from the pre-Soviet through the Soviet times and from the Soviet through the post-Soviet times. Target words were annotated by multiple crowd-source workers. The annotation process was organized following the DURel framework and was based on sentence contexts extracted from the Russian National Corpus. Additionally, we report the performance of several distributional approaches on RuSemShift, achieving promising results, which at the same time leave room for other researchers to improve.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes