CL LGJan 25, 2024

Enhanced Labeling Technique for Reddit Text and Fine-Tuned Longformer Models for Classifying Depression Severity in English and Luganda

Richard Kimera, Daniela N. Rim, Joseph Kirabira, Ubong Godwin Udomah, Heeyoul Choi

arXiv:2401.14240v11.91 citationsICTC

Originality Synthesis-oriented

AI Analysis

This work addresses early detection of depression severity for patients and clinicians using social media data, but it is incremental as it applies an existing method to new data with modest gains.

The research tackled depression severity classification by extracting Reddit text and fine-tuning a Longformer model, achieving performance improvements of 48% in English and 45% in Luganda compared to baseline models.

Depression is a global burden and one of the most challenging mental health conditions to control. Experts can detect its severity early using the Beck Depression Inventory (BDI) questionnaire, administer appropriate medication to patients, and impede its progression. Due to the fear of potential stigmatization, many patients turn to social media platforms like Reddit for advice and assistance at various stages of their journey. This research extracts text from Reddit to facilitate the diagnostic process. It employs a proposed labeling approach to categorize the text and subsequently fine-tunes the Longformer model. The model's performance is compared against baseline models, including Naive Bayes, Random Forest, Support Vector Machines, and Gradient Boosting. Our findings reveal that the Longformer model outperforms the baseline models in both English (48%) and Luganda (45%) languages on a custom-made dataset.

View on arXiv PDF

Similar