Using Linguistic Features to Estimate Suicide Probability of Chinese Microblog Users
This work addresses suicide prevention for social media users, offering a potential intervention tool, though it is incremental by applying existing NLP methods to a new domain.
The study tackled the problem of identifying high suicide risk individuals by analyzing linguistic features from Chinese microblog posts, using NLP methods like LIWC and LDA to train prediction models, with results showing LDA improved performance in estimating suicide probability.
If people with high risk of suicide can be identified through social media like microblog, it is possible to implement an active intervention system to save their lives. Based on this motivation, the current study administered the Suicide Probability Scale(SPS) to 1041 weibo users at Sina Weibo, which is a leading microblog service provider in China. Two NLP (Natural Language Processing) methods, the Chinese edition of Linguistic Inquiry and Word Count (LIWC) lexicon and Latent Dirichlet Allocation (LDA), are used to extract linguistic features from the Sina Weibo data. We trained predicting models by machine learning algorithm based on these two types of features, to estimate suicide probability based on linguistic features. The experiment results indicate that LDA can find topics that relate to suicide probability, and improve the performance of prediction. Our study adds value in prediction of suicidal probability of social network users with their behaviors.