CLAILGQMNov 25, 2021

Does constituency analysis enhance domain-specific pre-trained BERT models for relation extraction?

arXiv:2112.02955v13 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This is an incremental study for biomedical NLP researchers, focusing on enhancing domain-specific models for relation extraction in chemical-gene interactions.

The paper tackled the problem of relation extraction in the biomedical domain by testing whether adding syntactic information to pre-trained BERT models improves performance, finding that it increased precision but decreased recall, particularly for rare relations.

Recently many studies have been conducted on the topic of relation extraction. The DrugProt track at BioCreative VII provides a manually-annotated corpus for the purpose of the development and evaluation of relation extraction systems, in which interactions between chemicals and genes are studied. We describe the ensemble system that we used for our submission, which combines predictions of fine-tuned bioBERT, sciBERT and const-bioBERT models by majority voting. We specifically tested the contribution of syntactic information to relation extraction with BERT. We observed that adding constituentbased syntactic information to BERT improved precision, but decreased recall, since relations rarely seen in the train set were less likely to be predicted by BERT models in which the syntactic information is infused. Our code is available online [https://github.com/Maple177/drugprot-relation-extraction].

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes