CLAIMar 28, 2023

Evaluation of ChatGPT for NLP-based Mental Health Applications

arXiv:2303.15727v1123 citationsh-index: 9
Originality Synthesis-oriented
AI Analysis

This work assesses the potential of large language models for mental health applications, showing incremental improvements over simple baselines in specific classification tasks.

The study evaluated ChatGPT's performance on three mental health classification tasks using social media data, achieving F1 scores of 0.73 for stress detection, 0.86 for depression detection, and 0.37 for suicidality detection, which outperformed a baseline model.

Large language models (LLM) have been successful in several natural language understanding tasks and could be relevant for natural language processing (NLP)-based mental health application research. In this work, we report the performance of LLM-based ChatGPT (with gpt-3.5-turbo backend) in three text-based mental health classification tasks: stress detection (2-class classification), depression detection (2-class classification), and suicidality detection (5-class classification). We obtained annotated social media posts for the three classification tasks from public datasets. Then ChatGPT API classified the social media posts with an input prompt for classification. We obtained F1 scores of 0.73, 0.86, and 0.37 for stress detection, depression detection, and suicidality detection, respectively. A baseline model that always predicted the dominant class resulted in F1 scores of 0.35, 0.60, and 0.19. The zero-shot classification accuracy obtained with ChatGPT indicates a potential use of language models for mental health classification tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes