CLAILGNov 16, 2023

HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs

arXiv:2311.09774v2125 citationsh-index: 18
Originality Incremental advance
AI Analysis

This work addresses the problem of domain adaptation for language models in medicine, particularly for Traditional Chinese Medicine, offering a simplified training approach that improves performance over existing models, though it is incremental in adapting known methods to a specific domain.

The authors tackled the challenge of adapting language models to specialized domains like medicine by proposing a one-stage training protocol that unifies heterogeneous data into a simple input-output format, resulting in HuatuoGPT-II achieving state-of-the-art performance in Chinese medicine benchmarks, including outperforming proprietary models like ChatGPT and GPT-4 in some aspects and excelling in a fresh medical licensing exam.

Adapting a language model into a specific domain, a.k.a `domain adaption', is a common practice when specialized knowledge, e.g. medicine, is not encapsulated in a general language model like Llama2. The challenge lies in the heterogeneity of data across the two training stages, as it varies in languages, genres, or formats. To tackle this and simplify the learning protocol, we propose to transform heterogeneous data, from the both pre-training and supervised stages, into a unified, simple input-output pair format. We validate the new protocol in the domains where proprietary LLMs like ChatGPT perform relatively poorly, such as Traditional Chinese Medicine. The developed model, HuatuoGPT-II, has shown state-of-the-art performance in Chinese medicine domain on a number of benchmarks, e.g. medical licensing exams. It even outperforms proprietary models like ChatGPT and GPT-4 in some aspects, especially in Traditional Chinese Medicine. Expert manual evaluations further validate HuatuoGPT-II's advantages over existing LLMs. Notably, HuatuoGPT-II was benchmarked in a fresh Chinese National Medical Licensing Examination where it achieved the best performance, showcasing not only its effectiveness but also its generalization capabilities.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes