RuBioRoBERTa: a pre-trained biomedical language model for Russian language biomedical text mining
This addresses the problem of limited NLP resources for Russian biomedical researchers, though it is incremental as it adapts existing BERT methods to a new language domain.
The authors tackled the lack of pre-trained language models for Russian biomedical text mining by developing RuBioBERT and RuBioRoBERTa, achieving state-of-the-art results on the RuMedBench benchmark across tasks like classification and question answering.
This paper presents several BERT-based models for Russian language biomedical text mining (RuBioBERT, RuBioRoBERTa). The models are pre-trained on a corpus of freely available texts in the Russian biomedical domain. With this pre-training, our models demonstrate state-of-the-art results on RuMedBench - Russian medical language understanding benchmark that covers a diverse set of tasks, including text classification, question answering, natural language inference, and named entity recognition.