RuSemShift: a dataset of historical lexical semantic change in Russian
This provides a domain-specific resource for researchers in computational linguistics studying semantic change in Russian, though it is incremental as it builds on existing frameworks.
The authors tackled the problem of modeling historical lexical semantic change in Russian by creating RuSemShift, a large-scale manually annotated dataset covering pre-Soviet to post-Soviet times, and reported promising but improvable results from distributional approaches.
We present RuSemShift, a large-scale manually annotated test set for the task of semantic change modeling in Russian for two long-term time period pairs: from the pre-Soviet through the Soviet times and from the Soviet through the post-Soviet times. Target words were annotated by multiple crowd-source workers. The annotation process was organized following the DURel framework and was based on sentence contexts extracted from the Russian National Corpus. Additionally, we report the performance of several distributional approaches on RuSemShift, achieving promising results, which at the same time leave room for other researchers to improve.