CLSep 14, 2021

A system for information extraction from scientific texts in Russian

Elena Bruches, Anastasia Mezentseva, Tatiana Batura

arXiv:2109.06703v10.2

Originality Synthesis-oriented

AI Analysis

This addresses the problem of processing scientific texts in Russian for information retrieval and recommendation systems, but it is incremental as it applies existing methods to a new language domain.

The paper tackles information extraction from Russian scientific texts by performing term recognition, relation extraction, and term linking in an end-to-end system, achieving applicability in low-resource settings without requiring large labeled datasets.

In this paper, we present a system for information extraction from scientific texts in the Russian language. The system performs several tasks in an end-to-end manner: term recognition, extraction of relations between terms, and term linking with entities from the knowledge base. These tasks are extremely important for information retrieval, recommendation systems, and classification. The advantage of the implemented methods is that the system does not require a large amount of labeled data, which saves time and effort for data labeling and therefore can be applied in low- and mid-resource settings. The source code is publicly available and can be used for different research purposes.

View on arXiv PDF

Similar