CLSep 29, 2022

TERMinator: A system for scientific texts processing

arXiv:2209.14854v1580 citationsh-index: 9Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the extraction of scientific terms and relations for researchers in natural language processing, but it is incremental as it builds on existing methods with a new dataset and tool.

The authors tackled the problem of extracting entities and semantic relations from scientific texts by creating a new annotated dataset and developing the TERMinator system to study language models and heuristic approaches for term recognition and relation extraction. The results showed that language models pre-trained on the target language do not always perform best, and adding heuristics can improve task-specific quality, with the tool and corpus made publicly available.

This paper is devoted to the extraction of entities and semantic relations between them from scientific texts, where we consider scientific terms as entities. In this paper, we present a dataset that includes annotations for two tasks and develop a system called TERMinator for the study of the influence of language models on term recognition and comparison of different approaches for relation extraction. Experiments show that language models pre-trained on the target language are not always show the best performance. Also adding some heuristic approaches may improve the overall quality of the particular task. The developed tool and the annotated corpus are publicly available at https://github.com/iis-research-team/terminator and may be useful for other researchers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes