Chemical-protein relation extraction with ensembles of SVM, CNN, and RNN models
This work addresses the problem of automating relation extraction from biomedical literature for researchers, but it is incremental as it combines existing methods without introducing new paradigms.
The authors tackled chemical-protein relation extraction from PubMed abstracts by developing an ensemble of SVM, CNN, and RNN models, achieving an f-score of 0.6410 with precision of 0.7266 and recall of 0.5735, which was the highest performance in the 2017 CHEMPROT challenge.
Text mining the relations between chemicals and proteins is an increasingly important task. The CHEMPROT track at BioCreative VI aims to promote the development and evaluation of systems that can automatically detect the chemical-protein relations in running text (PubMed abstracts). This manuscript describes our submission, which is an ensemble of three systems, including a Support Vector Machine, a Convolutional Neural Network, and a Recurrent Neural Network. Their output is combined using a decision based on majority voting or stacking. Our CHEMPROT system obtained 0.7266 in precision and 0.5735 in recall for an f-score of 0.6410, demonstrating the effectiveness of machine learning-based approaches for automatic relation extraction from biomedical literature. Our submission achieved the highest performance in the task during the 2017 challenge.