CLAIOct 15, 2021

Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text

arXiv:2110.08228v1662 citations
Originality Highly original
AI Analysis

This work addresses the problem of disambiguating rare entities in biomedical text for researchers and practitioners, representing a strong specific gain rather than a foundational advancement.

The paper tackled the challenge of named entity disambiguation in biomedical text by proposing a cross-domain data integration method to transfer structural knowledge from general text to the medical domain, resulting in state-of-the-art performance on benchmarks like MedMentions and BC5CDR and improving rare entity disambiguation by up to 57 accuracy points.

Named entity disambiguation (NED), which involves mapping textual mentions to structured entities, is particularly challenging in the medical domain due to the presence of rare entities. Existing approaches are limited by the presence of coarse-grained structural resources in biomedical knowledge bases as well as the use of training datasets that provide low coverage over uncommon resources. In this work, we address these issues by proposing a cross-domain data integration method that transfers structural knowledge from a general text knowledge base to the medical domain. We utilize our integration scheme to augment structural resources and generate a large biomedical NED dataset for pretraining. Our pretrained model with injected structural knowledge achieves state-of-the-art performance on two benchmark medical NED datasets: MedMentions and BC5CDR. Furthermore, we improve disambiguation of rare entities by up to 57 accuracy points.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes