Enhancing Biomedical Relation Extraction with Directionality
This work addresses a specific bottleneck in biomedical relation extraction by adding directionality to a key dataset, enabling more accurate modeling of complex biological networks for researchers in bioinformatics and computational biology.
The authors tackled the problem of missing directionality (subject/object roles) in biomedical relation extraction by annotating the BioRED corpus with 10,864 directionality annotations and proposing a multi-task language model with soft-prompt learning, which outperformed state-of-the-art models like GPT-4 and Llama-3 on benchmarking tasks.
Biological relation networks contain rich information for understanding the biological mechanisms behind the relationship of entities such as genes, proteins, diseases, and chemicals. The vast growth of biomedical literature poses significant challenges updating the network knowledge. The recent Biomedical Relation Extraction Dataset (BioRED) provides valuable manual annotations, facilitating the develop-ment of machine-learning and pre-trained language model approaches for automatically identifying novel document-level (inter-sentence context) relationships. Nonetheless, its annotations lack directionality (subject/object) for the entity roles, essential for studying complex biological networks. Herein we annotate the entity roles of the relationships in the BioRED corpus and subsequently propose a novel multi-task language model with soft-prompt learning to jointly identify the relationship, novel findings, and entity roles. Our results in-clude an enriched BioRED corpus with 10,864 directionality annotations. Moreover, our proposed method outperforms existing large language models such as the state-of-the-art GPT-4 and Llama-3 on two benchmarking tasks. Our source code and dataset are available at https://github.com/ncbi-nlp/BioREDirect.