CLApr 24, 2025

Towards a comprehensive taxonomy of online abusive language informed by machine leaning

Samaneh Hosseini Moghaddam, Kelly Lyons, Cheryl Regehr, Vivek Goel, Kaitlyn Regehr

arXiv:2504.17653v11 citationsh-index: 8

Originality Synthesis-oriented

AI Analysis

This work addresses the need for a standardized taxonomy to improve detection and mitigation of online abuse, benefiting researchers, policymakers, and platform owners, though it is incremental as it synthesizes existing datasets.

The authors tackled the problem of inconsistent definitions in online abusive language detection by developing a hierarchical taxonomy with 5 categories and 17 dimensions, integrating classification systems from 18 existing datasets to provide a shared framework for researchers and stakeholders.

The proliferation of abusive language in online communications has posed significant risks to the health and wellbeing of individuals and communities. The growing concern regarding online abuse and its consequences necessitates methods for identifying and mitigating harmful content and facilitating continuous monitoring, moderation, and early intervention. This paper presents a taxonomy for distinguishing key characteristics of abusive language within online text. Our approach uses a systematic method for taxonomy development, integrating classification systems of 18 existing multi-label datasets to capture key characteristics relevant to online abusive language classification. The resulting taxonomy is hierarchical and faceted, comprising 5 categories and 17 dimensions. It classifies various facets of online abuse, including context, target, intensity, directness, and theme of abuse. This shared understanding can lead to more cohesive efforts, facilitate knowledge exchange, and accelerate progress in the field of online abuse detection and mitigation among researchers, policy makers, online platform owners, and other stakeholders.

View on arXiv PDF

Similar