Mining Mental Health Signals: A Comparative Study of Four Machine Learning Methods for Depression Detection from Social Media Posts in Sorani Kurdish
It addresses early detection of depression for Kurdish-speaking populations using social media, but is incremental as it applies existing methods to a new language.
This study tackled the problem of detecting depression from social media posts in Sorani Kurdish, a language with no prior research, by collecting and annotating 960 tweets and training four machine learning models, with Random Forest achieving 80% accuracy and F1-score.
Depression is a common mental health condition that can lead to hopelessness, loss of interest, self-harm, and even suicide. Early detection is challenging due to individuals not self-reporting or seeking timely clinical help. With the rise of social media, users increasingly express emotions online, offering new opportunities for detection through text analysis. While prior research has focused on languages such as English, no studies exist for Sorani Kurdish. This work presents a machine learning and Natural Language Processing (NLP) approach to detect depression in Sorani tweets. A set of depression-related keywords was developed with expert input to collect 960 public tweets from X (Twitter platform). The dataset was annotated into three classes: Shows depression, Not-show depression, and Suspicious by academics and final year medical students at the University of Kurdistan Hewlêr. Four supervised models, including Support Vector Machines, Multinomial Naive Bayes, Logistic Regression, and Random Forest, were trained and evaluated, with Random Forest achieving the highest performance accuracy and F1-score of 80%. This study establishes a baseline for automated depression detection in Kurdish language contexts.