CLSep 26, 2019

Biomedical relation extraction with pre-trained language representations and minimal task-specific architecture

arXiv:1909.12411v130.1999 citations

Originality Synthesis-oriented

AI Analysis

This work addresses biomedical text mining for researchers by providing an incremental improvement using pre-trained models.

The paper tackled biomedical relation extraction for gene-function change-disease triples by fine-tuning BERT with minimal task-specific architecture, achieving significant performance improvements over a random baseline despite class imbalance.

This paper presents our participation in the AGAC Track from the 2019 BioNLP Open Shared Tasks. We provide a solution for Task 3, which aims to extract "gene - function change - disease" triples, where "gene" and "disease" are mentions of particular genes and diseases respectively and "function change" is one of four pre-defined relationship types. Our system extends BERT (Devlin et al., 2018), a state-of-the-art language model, which learns contextual language representations from a large unlabelled corpus and whose parameters can be fine-tuned to solve specific tasks with minimal additional architecture. We encode the pair of mentions and their textual context as two consecutive sequences in BERT, separated by a special symbol. We then use a single linear layer to classify their relationship into five classes (four pre-defined, as well as 'no relation'). Despite considerable class imbalance, our system significantly outperforms a random baseline while relying on an extremely simple setup with no specially engineered features.

View on arXiv PDF

Similar