CDrugRed: A Chinese Drug Recommendation Dataset for Discharge Medications in Metabolic Diseases
This addresses the scarcity of non-English EHR datasets for drug recommendation systems, though it is incremental as it primarily introduces a new dataset.
The authors tackled the problem of intelligent drug recommendation by creating CDrugRed, the first publicly available Chinese dataset for discharge medications in metabolic diseases, containing 5,894 records from 3,190 patients, and benchmarked state-of-the-art LLMs with the best model achieving an F1 score of 0.5648 and Jaccard score of 0.4477.
Intelligent drug recommendation based on Electronic Health Records (EHRs) is critical for improving for improving the quality and efficiency of clinical decision-making. By leveraging large-scale patient data, drug recommendation systems can assist physicians in selecting the most appropriate medications according to a patient's medical history, diagnoses, laboratory results, and comorbidities. However, the advancement of such systems is significantly hampered by the scarcity of publicly available, real-world EHR datasets, particularly in languages other than English. In this work, we present CDrugRed, a first publicly available Chinese drug recommendation dataset focused on discharge medications for metabolic diseases. The dataset includes 5,894 de-identified records from 3,190 patients, containing comprehensive information such as patient demographics, medical history, clinical course, and discharge diagnoses. We assess the utility of CDrugRed by benchmarking several state-of-the-art large language models (LLMs) on the discharge medication recommendation task. Experimental results show that while supervised fine-tuning improves model performance, there remains substantial room for improvement, with the best model achieving the F1 score of 0.5648 and Jaccard score of 0.4477. This result highlights the complexity of the clinical drug recommendation task and establishes CDrugRed as a challenging and valuable resource for developing more robust and accurate drug recommendation systems. The dataset is publicly available to the research community under the data usage agreements at https://github.com/DUTIR-BioNLP/CDrugRed.