CAMS: An Annotated Corpus for Causal Analysis of Mental Health Issues in Social Media Posts
This provides a new annotated corpus for researchers analyzing mental health causes in social media, but it is incremental as it builds on existing datasets and methods.
The authors tackled the problem of causal analysis of mental health issues in social media by introducing the CAMS dataset, which includes 3155 annotated Reddit posts and 1896 re-annotated instances, and they demonstrated that a Logistic Regression model outperforms a CNN-LSTM model by 4.9% accuracy.
Research community has witnessed substantial growth in the detection of mental health issues and their associated reasons from analysis of social media. We introduce a new dataset for Causal Analysis of Mental health issues in Social media posts (CAMS). Our contributions for causal analysis are two-fold: causal interpretation and causal categorization. We introduce an annotation schema for this task of causal analysis. We demonstrate the efficacy of our schema on two different datasets: (i) crawling and annotating 3155 Reddit posts and (ii) re-annotating the publicly available SDCNL dataset of 1896 instances for interpretable causal analysis. We further combine these into the CAMS dataset and make this resource publicly available along with associated source code: https://github.com/drmuskangarg/CAMS. We present experimental results of models learned from CAMS dataset and demonstrate that a classic Logistic Regression model outperforms the next best (CNN-LSTM) model by 4.9\% accuracy.