CLMLMar 13

HMS-BERT: Hybrid Multi-Task Self-Training for Multilingual and Multi-Label Cyberbullying Detection

arXiv:2603.1292076.8
Predicted impact top 79% in CL · last 90 daysOriginality Incremental advance
AI Analysis

This addresses cyberbullying detection on social media for multilingual and multi-label scenarios, but it is incremental as it builds on existing BERT and self-training methods.

The paper tackled multilingual and multi-label cyberbullying detection by proposing HMS-BERT, a hybrid multi-task self-training framework, which achieved a macro F1-score of up to 0.9847 on the multi-label task and an accuracy of 0.6775 on the main classification task.

Cyberbullying on social media is inherently multilingual and multi-faceted, where abusive behaviors often overlap across multiple categories. Existing methods are commonly limited by monolingual assumptions or single-task formulations, which restrict their effectiveness in realistic multilingual and multi-label scenarios. In this paper, we propose HMS-BERT, a hybrid multi-task self-training framework for multilingual and multi-label cyberbullying detection. Built upon a pretrained multilingual BERT backbone, HMS-BERT integrates contextual representations with handcrafted linguistic features and jointly optimizes a fine-grained multi-label abuse classification task and a three-class main classification task. To address labeled data scarcity in low-resource languages, an iterative self-training strategy with confidence-based pseudo-labeling is introduced to facilitate cross-lingual knowledge transfer. Experiments on four public datasets demonstrate that HMS-BERT achieves strong performance, attaining a macro F1-score of up to 0.9847 on the multi-label task and an accuracy of 0.6775 on the main classification task. Ablation studies further verify the effectiveness of the proposed components.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes