CLOct 13, 2025

CNSocialDepress: A Chinese Social Media Dataset for Depression Risk Detection and Structured Analysis

Jinyuan Xu, Tian Lan, Xintao Yu, Xue He, Hezhi Zhang, Ying Wang, Pierre Magistry, Mathieu Valette, Lei Li

arXiv:2510.11233v1h-index: 1

Originality Synthesis-oriented

AI Analysis

This provides a structured dataset for depression risk identification and analysis in Chinese-speaking populations, though it is incremental as it extends existing approaches to a new language/data domain.

The authors tackled the scarcity of Chinese-language resources for depression risk detection by releasing CNSocialDepress, a benchmark dataset containing 44,178 texts from 233 users with 10,306 expert-annotated depression-related segments, which demonstrated utility across various NLP tasks including psychological profiling and LLM fine-tuning.

Depression is a pressing global public health issue, yet publicly available Chinese-language resources for risk detection remain scarce and are mostly limited to binary classification. To address this limitation, we release CNSocialDepress, a benchmark dataset for depression risk detection from Chinese social media posts. The dataset contains 44,178 texts from 233 users, within which psychological experts annotated 10,306 depression-related segments. CNSocialDepress provides binary risk labels together with structured multi-dimensional psychological attributes, enabling interpretable and fine-grained analysis of depressive signals. Experimental results demonstrate its utility across a wide range of NLP tasks, including structured psychological profiling and fine-tuning of large language models for depression detection. Comprehensive evaluations highlight the dataset's effectiveness and practical value for depression risk identification and psychological analysis, thereby providing insights to mental health applications tailored for Chinese-speaking populations.

View on arXiv PDF

Similar