CLOct 13, 2023

Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model

Qichen Ye, Junling Liu, Dading Chong, Peilin Zhou, Yining Hua, Fenglin Liu, Meng Cao, Ziming Wang, Xuxin Cheng, Zhu Lei, Zhenhua Guo

arXiv:2310.09089v28.533 citationsh-index: 19Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of resource-heavy pre-training and overconfident predictions in medical LLMs for healthcare applications, though it appears incremental as it builds on existing methods with a new dataset.

The paper tackles the challenge of integrating large language models into healthcare by proposing a multi-stage training method combining Domain-specific Continued Pre-training, Supervised Fine-tuning, and Direct Preference Optimization, resulting in Qilin-Med achieving 40.0% accuracy on the CMExam test set and further improvements with Retrieval Augmented Generation reaching 42.8% accuracy.

Integrating large language models (LLMs) into healthcare holds great potential but faces challenges. Pre-training LLMs from scratch for domains like medicine is resource-heavy and often unfeasible. On the other hand, sole reliance on Supervised Fine-tuning (SFT) can result in overconfident predictions and may not tap into domain-specific insights. In response, we present a multi-stage training method combining Domain-specific Continued Pre-training (DCPT), SFT, and Direct Preference Optimization (DPO). In addition, we publish a 3Gb Chinese Medicine (ChiMed) dataset, encompassing medical question answering, plain texts, knowledge graphs, and dialogues, segmented into three training stages. The medical LLM trained with our pipeline, Qilin-Med, shows substantial performance improvement. In the CPT and SFT phases, Qilin-Med achieved 38.4% and 40.0% accuracy on the CMExam test set, respectively. It outperformed the basemodel Baichuan-7B (accuracy: 33.5%), by 7.5%. In the DPO phase, it scored 16.66 in BLEU-1 and 27.44 in ROUGE-1 on the Huatuo-26M test set, bringing further improvement to the SFT phase (12.69 in BLEU-1 and 24.21 in ROUGE-1). Additionally, we have further enhanced the model's performance through the Retrieval Augmented Generation (RAG) approach. Experiments demonstrate that Qilin-Med-RAG achieves an accuracy rate of 42.8% on CMExam. These results highlight the contribution of our novel training approach in building LLMs for medical applications.

View on arXiv PDF Code

Similar