LLMs in Biomedicine: A study on clinical Named Entity Recognition
This work addresses the problem of data scarcity and language complexity in biomedical NLP for researchers and practitioners, representing an incremental advancement by adapting existing techniques like RAG to this domain.
The paper tackles the challenge of applying Large Language Models (LLMs) to biomedical Named Entity Recognition (NER) by exploring prompt design and external knowledge integration, resulting in a 15-20% F1 score improvement in few-shot settings and enhanced zero-shot performance with a proposed method called DiRAG.
Large Language Models (LLMs) demonstrate remarkable versatility in various NLP tasks but encounter distinct challenges in biomedical due to the complexities of language and data scarcity. This paper investigates LLMs application in the biomedical domain by exploring strategies to enhance their performance for the NER task. Our study reveals the importance of meticulously designed prompts in the biomedical. Strategic selection of in-context examples yields a marked improvement, offering ~15-20\% increase in F1 score across all benchmark datasets for biomedical few-shot NER. Additionally, our results indicate that integrating external biomedical knowledge via prompting strategies can enhance the proficiency of general-purpose LLMs to meet the specialized needs of biomedical NER. Leveraging a medical knowledge base, our proposed method, DiRAG, inspired by Retrieval-Augmented Generation (RAG), can boost the zero-shot F1 score of LLMs for biomedical NER. Code is released at \url{https://github.com/masoud-monajati/LLM_Bio_NER}