CLSep 15, 2021

Can Language Models be Biomedical Knowledge Bases?

arXiv:2109.07154v1673 citations
Originality Synthesis-oriented
AI Analysis

This addresses the need for domain-specific knowledge bases in biomedicine, but it is incremental as it builds on existing probing methods and reveals limitations.

The paper tackles the problem of whether pre-trained language models can serve as biomedical knowledge bases by creating the BioLAMA benchmark with 49K triples, finding that biomedical LMs achieve up to 18.51% Acc@5 but predictions are highly correlated with prompt templates, limiting their utility.

Pre-trained language models (LMs) have become ubiquitous in solving various natural language processing (NLP) tasks. There has been increasing interest in what knowledge these LMs contain and how we can extract that knowledge, treating LMs as knowledge bases (KBs). While there has been much work on probing LMs in the general domain, there has been little attention to whether these powerful LMs can be used as domain-specific KBs. To this end, we create the BioLAMA benchmark, which is comprised of 49K biomedical factual knowledge triples for probing biomedical LMs. We find that biomedical LMs with recently proposed probing methods can achieve up to 18.51% Acc@5 on retrieving biomedical knowledge. Although this seems promising given the task difficulty, our detailed analyses reveal that most predictions are highly correlated with prompt templates without any subjects, hence producing similar results on each relation and hindering their capabilities to be used as domain-specific KBs. We hope that BioLAMA can serve as a challenging benchmark for biomedical factual probing.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes