LOST: A Mental Health Dataset of Low Self-esteem in Reddit Posts
This work addresses early screening for mental health risks in social media users, though it is incremental as it builds on prior studies by focusing on specific risk factors rather than general symptoms.
The authors tackled the problem of detecting low self-esteem and interpersonal risk factors in social media posts by introducing a psychology-grounded dataset called LoST, annotated for supervised learning, and tested deep language models with data augmentation techniques to develop models infused with psychological knowledge.
Low self-esteem and interpersonal needs (i.e., thwarted belongingness (TB) and perceived burdensomeness (PB)) have a major impact on depression and suicide attempts. Individuals seek social connectedness on social media to boost and alleviate their loneliness. Social media platforms allow people to express their thoughts, experiences, beliefs, and emotions. Prior studies on mental health from social media have focused on symptoms, causes, and disorders. Whereas an initial screening of social media content for interpersonal risk factors and low self-esteem may raise early alerts and assign therapists to at-risk users of mental disturbance. Standardized scales measure self-esteem and interpersonal needs from questions created using psychological theories. In the current research, we introduce a psychology-grounded and expertly annotated dataset, LoST: Low Self esTeem, to study and detect low self-esteem on Reddit. Through an annotation approach involving checks on coherence, correctness, consistency, and reliability, we ensure gold-standard for supervised learning. We present results from different deep language models tested using two data augmentation techniques. Our findings suggest developing a class of language models that infuses psychological and clinical knowledge.