CL LGApr 10, 2022

MedDistant19: Towards an Accurate Benchmark for Broad-Coverage Biomedical Relation Extraction

Saadullah Amin, Pasquale Minervini, David Chang, Pontus Stenetorp, Günter Neumann

arXiv:2204.04779v230.9582 citationsh-index: 34Has Code

Originality Synthesis-oriented

AI Analysis

This work provides a more accurate benchmark for researchers in biomedical NLP to evaluate distantly supervised relation extraction methods, though it is incremental as it builds on existing approaches.

The authors tackled the problem of inaccurate benchmarks for broad-coverage biomedical relation extraction by identifying issues like train-test leakage and narrow entity focus in existing datasets, and they introduced MedDistant19, a new benchmark derived from MEDLINE abstracts and SNOMED CT that addresses these shortcomings.

Relation extraction in the biomedical domain is challenging due to the lack of labeled data and high annotation costs, needing domain experts. Distant supervision is commonly used to tackle the scarcity of annotated data by automatically pairing knowledge graph relationships with raw texts. Such a pipeline is prone to noise and has added challenges to scale for covering a large number of biomedical concepts. We investigated existing broad-coverage distantly supervised biomedical relation extraction benchmarks and found a significant overlap between training and test relationships ranging from 26% to 86%. Furthermore, we noticed several inconsistencies in the data construction process of these benchmarks, and where there is no train-test leakage, the focus is on interactions between narrower entity types. This work presents a more accurate benchmark MedDistant19 for broad-coverage distantly supervised biomedical relation extraction that addresses these shortcomings and is obtained by aligning the MEDLINE abstracts with the widely used SNOMED Clinical Terms knowledge base. Lacking thorough evaluation with domain-specific language models, we also conduct experiments validating general domain relation extraction findings to biomedical relation extraction.

View on arXiv PDF Code

Similar