CLSep 26, 2019

Biomedical relation extraction with pre-trained language representations and minimal task-specific architecture

arXiv:1909.12411v1999 citations
Originality Synthesis-oriented
AI Analysis

This work addresses biomedical text mining for researchers by providing an incremental improvement using pre-trained models.

The paper tackled biomedical relation extraction for gene-function change-disease triples by fine-tuning BERT with minimal task-specific architecture, achieving significant performance improvements over a random baseline despite class imbalance.

This paper presents our participation in the AGAC Track from the 2019 BioNLP Open Shared Tasks. We provide a solution for Task 3, which aims to extract "gene - function change - disease" triples, where "gene" and "disease" are mentions of particular genes and diseases respectively and "function change" is one of four pre-defined relationship types. Our system extends BERT (Devlin et al., 2018), a state-of-the-art language model, which learns contextual language representations from a large unlabelled corpus and whose parameters can be fine-tuned to solve specific tasks with minimal additional architecture. We encode the pair of mentions and their textual context as two consecutive sequences in BERT, separated by a special symbol. We then use a single linear layer to classify their relationship into five classes (four pre-defined, as well as 'no relation'). Despite considerable class imbalance, our system significantly outperforms a random baseline while relying on an extremely simple setup with no specially engineered features.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes