AINov 21, 2023

From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models

UW
arXiv:2311.13063v333 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses the problem of providing mental health professionals with interpretable tools for analyzing sensor data, though it is incremental in enhancing clinical decision-making through collaboration rather than standalone automation.

The paper tackles the challenge of using passively collected behavioral health data for clinical insights by leveraging large language models (LLMs) to generate reasoning about data trends, achieving 61.1% accuracy in binary depression classification and enabling a human-AI collaboration approach where clinicians combine expertise with AI-generated reasoning, with GPT-4 correctly referencing numerical data 75% of the time.

Passively collected behavioral health data from ubiquitous sensors holds significant promise to provide mental health professionals insights from patient's daily lives; however, developing analysis tools to use this data in clinical practice requires addressing challenges of generalization across devices and weak or ambiguous correlations between the measured signals and an individual's mental health. To address these challenges, we take a novel approach that leverages large language models (LLMs) to synthesize clinically useful insights from multi-sensor data. We develop chain of thought prompting methods that use LLMs to generate reasoning about how trends in data such as step count and sleep relate to conditions like depression and anxiety. We first demonstrate binary depression classification with LLMs achieving accuracies of 61.1% which exceed the state of the art. While it is not robust for clinical use, this leads us to our key finding: even more impactful and valued than classification is a new human-AI collaboration approach in which clinician experts interactively query these tools and combine their domain expertise and context about the patient with AI generated reasoning to support clinical decision-making. We find models like GPT-4 correctly reference numerical data 75% of the time, and clinician participants express strong interest in using this approach to interpret self-tracking data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes