CL AI HCMay 19, 2023

Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews

Hye Sun Yun, Iain J. Marshall, Thomas A. Trikalinos, Byron C. Wallace

arXiv:2305.11828v322.2140 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of producing timely and reliable medical evidence reviews for healthcare decision-makers, but it is incremental as it builds on existing LLM capabilities with expert input.

The study investigated the potential of large language models (LLMs) to automate medical systematic reviews, which are time-consuming but critical for healthcare, by interviewing 16 experts who identified benefits like drafting summaries and risks such as inaccuracies and decreased accountability.

Medical systematic reviews play a vital role in healthcare decision making and policy. However, their production is time-consuming, limiting the availability of high-quality and up-to-date evidence summaries. Recent advancements in large language models (LLMs) offer the potential to automatically generate literature reviews on demand, addressing this issue. However, LLMs sometimes generate inaccurate (and potentially misleading) texts by hallucination or omission. In healthcare, this can make LLMs unusable at best and dangerous at worst. We conducted 16 interviews with international systematic review experts to characterize the perceived utility and risks of LLMs in the specific context of medical evidence reviews. Experts indicated that LLMs can assist in the writing process by drafting summaries, generating templates, distilling information, and crosschecking information. They also raised concerns regarding confidently composed but inaccurate LLM outputs and other potential downstream harms, including decreased accountability and proliferation of low-quality reviews. Informed by this qualitative analysis, we identify criteria for rigorous evaluation of biomedical LLMs aligned with domain expert views.

View on arXiv PDF Code

Similar