CL AI IR LGSep 18, 2021

BERT-Beta: A Proactive Probabilistic Approach to Text Moderation

Fei Tan, Yifan Hu, Kevin Yen, Changwei Hu

arXiv:2109.08805v130.8663 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of improving text moderation for user-generated content by shifting from reactive to proactive methods, offering a novel perspective but with incremental technical contributions.

The paper tackles text moderation by introducing a proactive forecasting approach to predict the likelihood of a text attracting toxic comments, using beta regression for probabilistic modeling and achieving effective results in comprehensive experiments.

Text moderation for user generated content, which helps to promote healthy interaction among users, has been widely studied and many machine learning models have been proposed. In this work, we explore an alternative perspective by augmenting reactive reviews with proactive forecasting. Specifically, we propose a new concept {\it text toxicity propensity} to characterize the extent to which a text tends to attract toxic comments. Beta regression is then introduced to do the probabilistic modeling, which is demonstrated to function well in comprehensive experiments. We also propose an explanation method to communicate the model decision clearly. Both propensity scoring and interpretation benefit text moderation in a novel manner. Finally, the proposed scaling mechanism for the linear model offers useful insights beyond this work.

View on arXiv PDF

Similar