CL CYJun 6, 2023

Augmenting Reddit Posts to Determine Wellness Dimensions impacting Mental Health

Chandreen Liyanage, Muskan Garg, Vijay Mago, Sunghwan Sohn

arXiv:2306.04059v126.3225 citationsh-index: 38

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of mental health pre-screening from social media for researchers and practitioners, but it is incremental as it applies existing generative models to a specific data augmentation task.

The paper tackles the problem of imbalanced data in classifying Wellness Dimensions from social media text by using prompt-based generative NLP models for data augmentation, resulting in improvements of up to 13.11% in F-score and 15.95% in Matthew's Correlation Coefficient over baselines.

Amid ongoing health crisis, there is a growing necessity to discern possible signs of Wellness Dimensions (WD) manifested in self-narrated text. As the distribution of WD on social media data is intrinsically imbalanced, we experiment the generative NLP models for data augmentation to enable further improvement in the pre-screening task of classifying WD. To this end, we propose a simple yet effective data augmentation approach through prompt-based Generative NLP models, and evaluate the ROUGE scores and syntactic/semantic similarity among existing interpretations and augmented data. Our approach with ChatGPT model surpasses all the other methods and achieves improvement over baselines such as Easy-Data Augmentation and Backtranslation. Introducing data augmentation to generate more training samples and balanced dataset, results in the improved F-score and the Matthew's Correlation Coefficient for upto 13.11% and 15.95%, respectively.

View on arXiv PDF

Similar