Su-RoBERTa: A Semi-supervised Approach to Predicting Suicide Risk through Social Media using Base Language Models
This work addresses mental health monitoring for social media users, but it is incremental as it applies existing methods to a specific domain.
The paper tackled suicide risk prediction from social media posts using Reddit data, proposing Su-RoBERTa, a fine-tuned model that achieved a 69.84% weighted F1 score by leveraging labeled and unlabeled data with data augmentation.
In recent times, more and more people are posting about their mental states across various social media platforms. Leveraging this data, AI-based systems can be developed that help in assessing the mental health of individuals, such as suicide risk. This paper is a study done on suicidal risk assessments using Reddit data leveraging Base language models to identify patterns from social media posts. We have demonstrated that using smaller language models, i.e., less than 500M parameters, can also be effective in contrast to LLMs with greater than 500M parameters. We propose Su-RoBERTa, a fine-tuned RoBERTa on suicide risk prediction task that utilized both the labeled and unlabeled Reddit data and tackled class imbalance by data augmentation using GPT-2 model. Our Su-RoBERTa model attained a 69.84% weighted F1 score during the Final evaluation. This paper demonstrates the effectiveness of Base language models for the analysis of the risk factors related to mental health with an efficient computation pipeline