LGFeb 18, 2021

Deep Learning for Suicide and Depression Identification with Unsupervised Label Correction

Ayaan Haque, Viraaj Reddi, Tyler Giallanza

arXiv:2102.09427v222.381 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses an important clinical challenge for mental health monitoring by improving classification accuracy in noisy web-scraped data, though it is incremental as it builds on existing deep learning approaches for text classification.

The paper tackles the problem of distinguishing between depression and suicidal ideation in text, using online Reddit data with noisy labels, and proposes an unsupervised label correction method that achieves strong performance without requiring prior noise distribution information.

Early detection of suicidal ideation in depressed individuals can allow for adequate medical attention and support, which in many cases is life-saving. Recent NLP research focuses on classifying, from a given piece of text, if an individual is suicidal or clinically healthy. However, there have been no major attempts to differentiate between depression and suicidal ideation, which is an important clinical challenge. Due to the scarce availability of EHR data, suicide notes, or other similar verified sources, web query data has emerged as a promising alternative. Online sources, such as Reddit, allow for anonymity that prompts honest disclosure of symptoms, making it a plausible source even in a clinical setting. However, these online datasets also result in lower performance, which can be attributed to the inherent noise in web-scraped labels, which necessitates a noise-removal process. Thus, we propose SDCNL, a suicide versus depression classification method through a deep learning approach. We utilize online content from Reddit to train our algorithm, and to verify and correct noisy labels, we propose a novel unsupervised label correction method which, unlike previous work, does not require prior noise distribution information. Our extensive experimentation with multiple deep word embedding models and classifiers display the strong performance of the method in anew, challenging classification application. We make our code and dataset available at https://github.com/ayaanzhaque/SDCNL

View on arXiv PDF Code

Similar