Large Language Models Perform Diagnostic Reasoning
This addresses the challenge of enhancing diagnostic accuracy in healthcare using large language models, though it is incremental as it builds on existing prompting techniques.
The paper tackles the problem of improving automatic medical diagnosis by extending chain-of-thought prompting to mimic doctors' reasoning, resulting in a 15% accuracy improvement with standard prompting and an 18% gain in out-domain settings.
We explore the extension of chain-of-thought (CoT) prompting to medical reasoning for the task of automatic diagnosis. Motivated by doctors' underlying reasoning process, we present Diagnostic-Reasoning CoT (DR-CoT). Empirical results demonstrate that by simply prompting large language models trained only on general text corpus with two DR-CoT exemplars, the diagnostic accuracy improves by 15% comparing to standard prompting. Moreover, the gap reaches a pronounced 18% in out-domain settings. Our findings suggest expert-knowledge reasoning in large language models can be elicited through proper promptings.