CLAIMay 5

Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction

arXiv:2605.0422114.1h-index: 6
AI Analysis

For clinical NLP practitioners needing privacy-compliant entity extraction from domain-specific notes, this work provides a locally deployable framework with strong performance, though the approach is incremental.

The paper tackles clinical named entity extraction from unstructured dental notes, achieving micro/macro F1 scores of 0.864/0.837 with Qwen2.5-14B-Instruct after DPO, demonstrating that automated prompt optimization and lightweight post-training enable scalable, privacy-sensitive extraction using small language models.

Clinical named entity recognition from dental progress notes is challenging because documentation is highly unstructured, domain-specific, and often privacy-sensitive. We developed a locally deployable framework that enables small language models to self-generate, verify, refine, and evaluate entity-specific prompts for extracting multiple clinical entities from dental notes. Using 1,200 annotated notes, we evaluated candidate open-weight models with multi-prompt ensemble inference and further adapted selected models using QLoRA-based supervised fine-tuning and direct preference optimization. Model performance varied substantially, highlighting the need for task-specific evaluation rather than reliance on generic benchmarks. Qwen2.5-14B-Instruct achieved the strongest baseline performance. After DPO, Qwen2.5-14B-Instruct and Llama-3.1-8B-Instruct achieved micro/macro F1 scores of 0.864/0.837 and 0.806/0.797, respectively. These findings suggest that automated prompt optimization combined with lightweight preference-based post-training can support scalable clinical information extraction using locally deployed small language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes