CLAICYOct 24, 2024

Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use

Georgia Tech
arXiv:2410.19155v317 citationsh-index: 13NAACL
Originality Incremental advance
AI Analysis

This addresses a critical gap in healthcare for mental health patients, but it is incremental as it builds on existing LLM evaluation methods with a new benchmark.

The paper tackles the problem of Large Language Models (LLMs) struggling to align with experts in detecting Adverse Drug Reactions (ADRs) from psychiatric medications and providing effective harm reduction strategies, finding that LLMs are only 70.86% aligned with expert strategies and provide 12.32% less actionable advice on average.

Adverse Drug Reactions (ADRs) from psychiatric medications are the leading cause of hospitalizations among mental health patients. With healthcare systems and online communities facing limitations in resolving ADR-related issues, Large Language Models (LLMs) have the potential to fill this gap. Despite the increasing capabilities of LLMs, past research has not explored their capabilities in detecting ADRs related to psychiatric medications or in providing effective harm reduction strategies. To address this, we introduce the Psych-ADR benchmark and the Adverse Drug Reaction Response Assessment (ADRA) framework to systematically evaluate LLM performance in detecting ADR expressions and delivering expert-aligned mitigation strategies. Our analyses show that LLMs struggle with understanding the nuances of ADRs and differentiating between types of ADRs. While LLMs align with experts in terms of expressed emotions and tone of the text, their responses are more complex, harder to read, and only 70.86% aligned with expert strategies. Furthermore, they provide less actionable advice by a margin of 12.32% on average. Our work provides a comprehensive benchmark and evaluation framework for assessing LLMs in strategy-driven tasks within high-risk domains.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes