AutoPCR: Automated Phenotype Concept Recognition by Prompting
This addresses the problem of generalizing phenotype recognition across diverse text types and evolving terminology for biomedical researchers and practitioners, though it is incremental as it builds on existing prompting and retrieval techniques.
The authors tackled phenotype concept recognition in biomedical text by introducing AutoPCR, a prompt-based method that eliminates the need for ontology-specific training, achieving the best average and most robust performance across four benchmark datasets.
Phenotype concept recognition (CR) is a fundamental task in biomedical text mining, enabling applications such as clinical diagnostics and knowledge graph construction. However, existing methods often require ontology-specific training and struggle to generalize across diverse text types and evolving biomedical terminology. We present AutoPCR, a prompt-based phenotype CR method that does not require ontology-specific training. AutoPCR performs CR in three stages: entity extraction using a hybrid of rule-based and neural tagging strategies, candidate retrieval via SapBERT, and entity linking through prompting a large language model. Experiments on four benchmark datasets show that AutoPCR achieves the best average and most robust performance across both mention-level and document-level evaluations, surpassing prior state-of-the-art methods. Further ablation and transfer studies demonstrate its inductive capability and generalizability to new ontologies.