CL LGJun 7, 2023

Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning

Brihat Sharma, Yanjun Gao, Timothy Miller, Matthew M. Churpek, Majid Afshar, Dmitriy Dligach

Harvard

arXiv:2306.04551v226.3224 citationsh-index: 40

Originality Incremental advance

AI Analysis

This research addresses diagnostic errors in healthcare by optimizing AI systems for clinical reasoning, though it is incremental as it builds on existing benchmarks and methods.

The study tackled the problem of improving clinical diagnostic reasoning by comparing in-domain versus out-ofomain language models and multi-task versus single-task training on the DR.BENCH framework, resulting in a multi-task, clinically trained model achieving a new state-of-the-art ROUGE-L score of 28.55.

Generative artificial intelligence (AI) is a promising direction for augmenting clinical diagnostic decision support and reducing diagnostic errors, a leading contributor to medical errors. To further the development of clinical AI systems, the Diagnostic Reasoning Benchmark (DR.BENCH) was introduced as a comprehensive generative AI framework, comprised of six tasks representing key components in clinical reasoning. We present a comparative analysis of in-domain versus out-of-domain language models as well as multi-task versus single task training with a focus on the problem summarization task in DR.BENCH (Gao et al., 2023). We demonstrate that a multi-task, clinically trained language model outperforms its general domain counterpart by a large margin, establishing a new state-of-the-art performance, with a ROUGE-L score of 28.55. This research underscores the value of domain-specific training for optimizing clinical diagnostic reasoning tasks.

View on arXiv PDF

Similar