CLAIDec 28, 2025

Harnessing Large Language Models for Biomedical Named Entity Recognition

arXiv:2512.22738v1h-index: 3
Originality Incremental advance
AI Analysis

This work addresses the challenge of domain adaptation for biomedical informatics, offering an incremental improvement in data efficiency for downstream applications like drug discovery.

The paper tackled the problem of adapting large language models to biomedical named entity recognition by introducing BioSelectTune, a data-centric framework that uses hybrid superfiltering to curate high-quality training data, achieving state-of-the-art performance on multiple benchmarks and outperforming specialized models like BioMedBERT with only 50% of the curated data.

Background and Objective: Biomedical Named Entity Recognition (BioNER) is a foundational task in medical informatics, crucial for downstream applications like drug discovery and clinical trial matching. However, adapting general-domain Large Language Models (LLMs) to this task is often hampered by their lack of domain-specific knowledge and the performance degradation caused by low-quality training data. To address these challenges, we introduce BioSelectTune, a highly efficient, data-centric framework for fine-tuning LLMs that prioritizes data quality over quantity. Methods and Results: BioSelectTune reformulates BioNER as a structured JSON generation task and leverages our novel Hybrid Superfiltering strategy, a weak-to-strong data curation method that uses a homologous weak model to distill a compact, high-impact training dataset. Conclusions: Through extensive experiments, we demonstrate that BioSelectTune achieves state-of-the-art (SOTA) performance across multiple BioNER benchmarks. Notably, our model, trained on only 50% of the curated positive data, not only surpasses the fully-trained baseline but also outperforms powerful domain-specialized models like BioMedBERT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes