End-to-end Biomedical Entity Linking with Span-based Dictionary Matching
This addresses a key bottleneck in biomedical text mining for researchers and practitioners, though it is incremental as it builds on existing neural joint learning approaches.
The paper tackles the problem of linking biomedical disease names to concepts, especially for unseen concepts not in the training data, by introducing an end-to-end model that combines span representations with dictionary matching, achieving competitive results on two major datasets.
Disease name recognition and normalization, which is generally called biomedical entity linking, is a fundamental process in biomedical text mining. Recently, neural joint learning of both tasks has been proposed to utilize the mutual benefits. While this approach achieves high performance, disease concepts that do not appear in the training dataset cannot be accurately predicted. This study introduces a novel end-to-end approach that combines span representations with dictionary-matching features to address this problem. Our model handles unseen concepts by referring to a dictionary while maintaining the performance of neural network-based models, in an end-to-end fashion. Experiments using two major datasets demonstrate that our model achieved competitive results with strong baselines, especially for unseen concepts during training.