CLAILGSep 27, 2024

Building a Chinese Medical Dialogue System: Integrating Large-scale Corpora and Novel Models

arXiv:2410.03521v2h-index: 6
Originality Incremental advance
AI Analysis

This addresses data scarcity and knowledge gaps for Chinese medical dialogue systems, but it is incremental as it builds on existing PLMs like BERT and GPT.

The paper tackled the lack of large-scale medical datasets and poor medical knowledge in existing systems by constructing the Large-scale Chinese Medical Dialogue Corpora (LCMDC) and proposing new triage and consultation models, with experimental results showing efficacy on this dataset.

The global COVID-19 pandemic underscored major deficiencies in traditional healthcare systems, hastening the advancement of online medical services, especially in medical triage and consultation. However, existing studies face two main challenges. First, the scarcity of large-scale, publicly available, domain-specific medical datasets due to privacy concerns, with current datasets being small and limited to a few diseases, limiting the effectiveness of triage methods based on Pre-trained Language Models (PLMs). Second, existing methods lack medical knowledge and struggle to accurately understand professional terms and expressions in patient-doctor consultations. To overcome these obstacles, we construct the Large-scale Chinese Medical Dialogue Corpora (LCMDC), thereby addressing the data shortage in this field. Moreover, we further propose a novel triage system that combines BERT-based supervised learning with prompt learning, as well as a GPT-based medical consultation model. To enhance domain knowledge acquisition, we pre-trained PLMs using our self-constructed background corpus. Experimental results on the LCMDC demonstrate the efficacy of our proposed systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes