CLJan 25, 2025

SCCD: A Session-based Dataset for Chinese Cyberbullying Detection

arXiv:2501.15042v121 citationsh-index: 4COLING
Originality Synthesis-oriented
AI Analysis

This addresses the problem of underdeveloped cyberbullying detection research in Chinese for social media users, but it is incremental as it primarily provides a new dataset.

The authors tackled the lack of comprehensive datasets for Chinese cyberbullying detection by introducing SCCD, a session-based dataset with 677 samples from Weibo, featuring fine-grained comment-level annotations, and they evaluated baseline methods to highlight detection challenges.

The rampant spread of cyberbullying content poses a growing threat to societal well-being. However, research on cyberbullying detection in Chinese remains underdeveloped, primarily due to the lack of comprehensive and reliable datasets. Notably, no existing Chinese dataset is specifically tailored for cyberbullying detection. Moreover, while comments play a crucial role within sessions, current session-based datasets often lack detailed, fine-grained annotations at the comment level. To address these limitations, we present a novel Chinese cyber-bullying dataset, termed SCCD, which consists of 677 session-level samples sourced from a major social media platform Weibo. Moreover, each comment within the sessions is annotated with fine-grained labels rather than conventional binary class labels. Empirically, we evaluate the performance of various baseline methods on SCCD, highlighting the challenges for effective Chinese cyberbullying detection.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes