CL AI LGOct 3, 2023

On the definition of toxicity in NLP

Sergey Berezin, Reza Farahbakhsh, Noel Crespi

arXiv:2310.02357v30.92 citationsh-index: 27

Originality Highly original

AI Analysis

This addresses the issue of subjective and vague data in toxicity detection for NLP researchers and practitioners, potentially improving model robustness and accuracy.

The paper tackles the problem of ill-defined toxicity in NLP by proposing a new stress-level-based definition designed to be objective and context-aware, and describes its application to dataset creation and model training.

The fundamental problem in toxicity detection task lies in the fact that the toxicity is ill-defined. This causes us to rely on subjective and vague data in models' training, which results in non-robust and non-accurate results: garbage in - garbage out. This work suggests a new, stress-level-based definition of toxicity designed to be objective and context-aware. On par with it, we also describe possible ways of applying this new definition to dataset creation and model training.

View on arXiv PDF

Similar