Atypical lexical abbreviations identification in Russian medical texts
This work addresses the challenge of abbreviation comprehension for readers of Russian medical texts, though it is incremental as it applies existing ML methods to a new domain-specific dataset.
The paper tackled the problem of identifying atypical lexical abbreviations in Russian medical texts, achieving a ROC AUC score of 0.926 and an F1 score of 0.706, which are competitive with baselines.
Abbreviation is a method of word formation that aims to construct the shortened term from the first letters of the initial phrase. Implicit abbreviations frequently cause the comprehension difficulties for unprepared readers. In this paper, we propose an efficient ML-based algorithm which allows to identify the abbreviations in Russian texts. The method achieves ROC AUC score 0.926 and F1 score 0.706 which are confirmed as competitive in comparison with the baselines. Along with the pipeline, we also establish first to our knowledge Russian dataset that is relevant for the desired task.