Multi-class Categorization of Reasons behind Mental Disturbance in Long Texts
This work addresses the challenge of multi-class causal categorization for mental health analysis in social media data, which is incremental as it adapts an existing model (Longformer) to a specific domain bottleneck.
The paper tackles the problem of identifying causal indicators behind mental illness in long self-reported texts, such as Reddit posts up to 4000 words, by using Longformer to overcome transformer length limitations, achieving a state-of-the-art 62% F1-score on the M-CAMS dataset.
Motivated with recent advances in inferring users' mental state in social media posts, we identify and formulate the problem of finding causal indicators behind mental illness in self-reported text. In the past, we witness the presence of rule-based studies for causal explanation analysis on curated Facebook data. The investigation on transformer-based model for multi-class causal categorization in Reddit posts point to a problem of using long-text which contains as many as 4000 words. Developing end-to-end transformer-based models subject to the limitation of maximum-length in a given instance. To handle this problem, we use Longformer and deploy its encoding on transformer-based classifier. The experimental results show that Longformer achieves new state-of-the-art results on M-CAMS, a publicly available dataset with 62\% F1-score. Cause-specific analysis and ablation study prove the effectiveness of Longformer. We believe our work facilitates causal analysis of depression and suicide risk on social media data, and shows potential for application on other mental health conditions.