Comparative Analysis of LoRA-Adapted Embedding Models for Clinical Cardiology Text Representation
This provides practical guidance for clinical NLP system development by challenging the assumption that larger models always yield better domain-specific embeddings, though it is incremental in comparing existing adaptation methods.
This study compared ten transformer-based embedding models adapted for cardiology using LoRA fine-tuning on 106,535 cardiology text pairs, finding that encoder-only architectures like BioLinkBERT achieved superior domain-specific performance (separation score: 0.510) with fewer computational resources than larger decoder-based models.
Domain-specific text embeddings are critical for clinical natural language processing, yet systematic comparisons across model architectures remain limited. This study evaluates ten transformer-based embedding models adapted for cardiology through Low-Rank Adaptation (LoRA) fine-tuning on 106,535 cardiology text pairs derived from authoritative medical textbooks. Results demonstrate that encoder-only architectures, particularly BioLinkBERT, achieve superior domain-specific performance (separation score: 0.510) compared to larger decoder-based models, while requiring significantly fewer computational resources. The findings challenge the assumption that larger language models necessarily produce better domain-specific embeddings and provide practical guidance for clinical NLP system development. All models, training code, and evaluation datasets are publicly available to support reproducible research in medical informatics.