CLApr 25, 2024

Large Language Models Perform on Par with Experts Identifying Mental Health Factors in Adolescent Online Forums

Isabelle Lorge, Dan W. Joyce, Andrey Kormilitzin

arXiv:2404.16461v21.01 citationsh-index: 20

Originality Incremental advance

AI Analysis

This addresses the need for cost and time-efficient mental health monitoring in adolescents, though it is incremental as it applies existing LLMs to a new domain with some limitations.

The study tackled the problem of identifying mental health factors in adolescent online forums by comparing expert annotations with those from LLMs like GPT-3.5 and GPT-4, finding that GPT-4 performed on par with human inter-annotator agreement and showed substantially higher performance on synthetic data.

Mental health in children and adolescents has been steadily deteriorating over the past few years. The recent advent of Large Language Models (LLMs) offers much hope for cost and time efficient scaling of monitoring and intervention, yet despite specifically prevalent issues such as school bullying and eating disorders, previous studies on have not investigated performance in this domain or for open information extraction where the set of answers is not predetermined. We create a new dataset of Reddit posts from adolescents aged 12-19 annotated by expert psychiatrists for the following categories: TRAUMA, PRECARITY, CONDITION, SYMPTOMS, SUICIDALITY and TREATMENT and compare expert labels to annotations from two top performing LLMs (GPT3.5 and GPT4). In addition, we create two synthetic datasets to assess whether LLMs perform better when annotating data as they generate it. We find GPT4 to be on par with human inter-annotator agreement and performance on synthetic data to be substantially higher, however we find the model still occasionally errs on issues of negation and factuality and higher performance on synthetic data is driven by greater complexity of real data rather than inherent advantage.

View on arXiv PDF

Similar