CLAIJan 20, 2024

Enhancing Large Language Models for Clinical Decision Support by Incorporating Clinical Practice Guidelines

arXiv:2401.11120v234 citationsICHI
Originality Incremental advance
AI Analysis

This work addresses the need for more accurate AI-driven clinical recommendations, particularly for COVID-19 outpatient treatment, though it is incremental as it builds on existing LLM methods.

The study tackled the problem of improving large language models for clinical decision support by incorporating clinical practice guidelines, finding that all four tested LLMs showed improved performance with CPGs, with the Binary Decision Tree method outperforming others in automatic evaluation.

Background Large Language Models (LLMs), enhanced with Clinical Practice Guidelines (CPGs), can significantly improve Clinical Decision Support (CDS). However, methods for incorporating CPGs into LLMs are not well studied. Methods We develop three distinct methods for incorporating CPGs into LLMs: Binary Decision Tree (BDT), Program-Aided Graph Construction (PAGC), and Chain-of-Thought-Few-Shot Prompting (CoT-FSP). To evaluate the effectiveness of the proposed methods, we create a set of synthetic patient descriptions and conduct both automatic and human evaluation of the responses generated by four LLMs: GPT-4, GPT-3.5 Turbo, LLaMA, and PaLM 2. Zero-Shot Prompting (ZSP) was used as the baseline method. We focus on CDS for COVID-19 outpatient treatment as the case study. Results All four LLMs exhibit improved performance when enhanced with CPGs compared to the baseline ZSP. BDT outperformed both CoT-FSP and PAGC in automatic evaluation. All of the proposed methods demonstrated high performance in human evaluation. Conclusion LLMs enhanced with CPGs demonstrate superior performance, as compared to plain LLMs with ZSP, in providing accurate recommendations for COVID-19 outpatient treatment, which also highlights the potential for broader applications beyond the case study.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes