A Continued Pretrained LLM Approach for Automatic Medical Note Generation
This work addresses the problem of expensive LLM usage for medical professionals by providing a cost-effective alternative with competitive performance, though it is incremental as it builds on existing LLaMA2 models.
The authors tackled the high cost of advanced LLMs like GPT-4 in specialized fields by introducing HEAL, a continuously trained 13B LLaMA2-based LLM for medical conversations, which outperformed GPT-4 in PubMedQA with 78.4% accuracy and achieved parity in medical note generation while surpassing GPT-4 and Med-PaLM 2 in identifying correct medical concepts.
LLMs are revolutionizing NLP tasks. However, the use of the most advanced LLMs, such as GPT-4, is often prohibitively expensive for most specialized fields. We introduce HEAL, the first continuously trained 13B LLaMA2-based LLM that is purpose-built for medical conversations and measured on automated scribing. Our results demonstrate that HEAL outperforms GPT-4 and PMC-LLaMA in PubMedQA, with an accuracy of 78.4\%. It also achieves parity with GPT-4 in generating medical notes. Remarkably, HEAL surpasses GPT-4 and Med-PaLM 2 in identifying more correct medical concepts and exceeds the performance of human scribes and other comparable models in correctness and completeness.