CLLGApr 10, 2022

MedDistant19: Towards an Accurate Benchmark for Broad-Coverage Biomedical Relation Extraction

arXiv:2204.04779v2582 citationsh-index: 34
Originality Synthesis-oriented
AI Analysis

This work provides a more accurate benchmark for researchers in biomedical NLP to evaluate distantly supervised relation extraction methods, though it is incremental as it builds on existing approaches.

The authors tackled the problem of inaccurate benchmarks for broad-coverage biomedical relation extraction by identifying issues like train-test leakage and narrow entity focus in existing datasets, and they introduced MedDistant19, a new benchmark derived from MEDLINE abstracts and SNOMED CT that addresses these shortcomings.

Relation extraction in the biomedical domain is challenging due to the lack of labeled data and high annotation costs, needing domain experts. Distant supervision is commonly used to tackle the scarcity of annotated data by automatically pairing knowledge graph relationships with raw texts. Such a pipeline is prone to noise and has added challenges to scale for covering a large number of biomedical concepts. We investigated existing broad-coverage distantly supervised biomedical relation extraction benchmarks and found a significant overlap between training and test relationships ranging from 26% to 86%. Furthermore, we noticed several inconsistencies in the data construction process of these benchmarks, and where there is no train-test leakage, the focus is on interactions between narrower entity types. This work presents a more accurate benchmark MedDistant19 for broad-coverage distantly supervised biomedical relation extraction that addresses these shortcomings and is obtained by aligning the MEDLINE abstracts with the widely used SNOMED Clinical Terms knowledge base. Lacking thorough evaluation with domain-specific language models, we also conduct experiments validating general domain relation extraction findings to biomedical relation extraction.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes