CLAIAug 28, 2023

Biomedical Entity Linking with Triple-aware Pre-Training

arXiv:2308.14429v13 citationsh-index: 27
Originality Synthesis-oriented
AI Analysis

This addresses linking biomedical entities for NLP tasks like text mining, but appears incremental as it builds on prior knowledge graph injection methods without clear gains.

The paper tackled the problem of linking biomedical entities by proposing a triple-aware pre-training framework for generative large language models, but evaluations did not confirm benefits from including synonym, description, or relational information.

Linking biomedical entities is an essential aspect in biomedical natural language processing tasks, such as text mining and question answering. However, a difficulty of linking the biomedical entities using current large language models (LLM) trained on a general corpus is that biomedical entities are scarcely distributed in texts and therefore have been rarely seen during training by the LLM. At the same time, those LLMs are not aware of high level semantic connection between different biomedical entities, which are useful in identifying similar concepts in different textual contexts. To cope with aforementioned problems, some recent works focused on injecting knowledge graph information into LLMs. However, former methods either ignore the relational knowledge of the entities or lead to catastrophic forgetting. Therefore, we propose a novel framework to pre-train the powerful generative LLM by a corpus synthesized from a KG. In the evaluations we are unable to confirm the benefit of including synonym, description or relational information.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes