CL AINov 11, 2021

CU-UD: text-mining drug and chemical-protein interactions with ensembles of BERT-based models

Mehmet Efruz Karabulut, K. Vijay-Shanker, Yifan Peng

arXiv:2112.03004v10.2Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses a text-mining task for biomedical researchers, but it is incremental as it applies existing ensemble methods to a specific benchmark.

The paper tackled the problem of automatically detecting relations between chemicals and proteins in PubMed abstracts, using an ensemble of BERT-based models, and achieved an F1 score of 0.7739 with precision of 0.7708 and recall of 0.7770.

Identifying the relations between chemicals and proteins is an important text mining task. BioCreative VII track 1 DrugProt task aims to promote the development and evaluation of systems that can automatically detect relations between chemical compounds/drugs and genes/proteins in PubMed abstracts. In this paper, we describe our submission, which is an ensemble system, including multiple BERT-based language models. We combine the outputs of individual models using majority voting and multilayer perceptron. Our system obtained 0.7708 in precision and 0.7770 in recall, for an F1 score of 0.7739, demonstrating the effectiveness of using ensembles of BERT-based language models for automatically detecting relations between chemicals and proteins. Our code is available at https://github.com/bionlplab/drugprot_bcvii.

View on arXiv PDF Code

Similar