Data Quality Matters: Suicide Intention Detection on Social Media Posts Using RoBERTa-CNN
This work addresses early detection of suicide risk for mental health applications, but it is incremental as it combines existing methods (RoBERTa and CNN) on a specific dataset.
The paper tackled suicide intention detection in social media posts by proposing a RoBERTa-CNN model, achieving a mean accuracy of 98% with a standard deviation of 0.0009 on the Suicide and Depression Detection dataset.
Suicide remains a pressing global health concern, necessitating innovative approaches for early detection and intervention. This paper focuses on identifying suicidal intentions in posts from the SuicideWatch subreddit by proposing a novel deep-learning approach that utilizes the state-of-the-art RoBERTa-CNN model. The robustly Optimized BERT Pretraining Approach (RoBERTa) excels at capturing textual nuances and forming semantic relationships within the text. The remaining Convolutional Neural Network (CNN) head enhances RoBERTa's capacity to discern critical patterns from extensive datasets. To evaluate RoBERTa-CNN, we conducted experiments on the Suicide and Depression Detection dataset, yielding promising results. For instance, RoBERTa-CNN achieves a mean accuracy of 98% with a standard deviation (STD) of 0.0009. Additionally, we found that data quality significantly impacts the training of a robust model. To improve data quality, we removed noise from the text data while preserving its contextual content through either manually cleaning or utilizing the OpenAI API.