Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation
This work addresses a critical problem in mental health applications by highlighting limitations in using LLMs for suicide prevention, which is incremental as it builds on existing evaluation frameworks but introduces a novel dataset and specific focus.
The study evaluated the ability of eight large language models to identify implicit suicidal ideation and provide supportive responses, finding that current models struggle significantly with these tasks, with performance metrics indicating low accuracy and high error rates.
We present a comprehensive evaluation framework for assessing Large Language Models' (LLMs) capabilities in suicide prevention, focusing on two critical aspects: the Identification of Implicit Suicidal ideation (IIS) and the Provision of Appropriate Supportive responses (PAS). We introduce \ourdata, a novel dataset of 1,308 test cases built upon psychological frameworks including D/S-IAT and Negative Automatic Thinking, alongside real-world scenarios. Through extensive experiments with 8 widely used LLMs under different contextual settings, we find that current models struggle significantly with detecting implicit suicidal ideation and providing appropriate support, highlighting crucial limitations in applying LLMs to mental health contexts. Our findings underscore the need for more sophisticated approaches in developing and evaluating LLMs for sensitive psychological applications.