CL LG SIFeb 6, 2025

A Classification System Approach in Predicting Chinese Censorship

arXiv:2502.04234v1

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of automated censorship prediction for researchers and policymakers in China, but it is incremental as it applies existing methods to a specific dataset.

The paper tackled predicting censorship of Weibo posts in China by developing classification models, finding that a fine-tuned BERT model outperformed other methods, achieving higher macro-F1 and ROC-AUC scores.

This paper is dedicated to using a classifier to predict whether a Weibo post would be censored under the Chinese internet. Through randomized sampling from \citeauthor{Fu2021} and Chinese tokenizing strategies, we constructed a cleaned Chinese phrase dataset with binary censorship markings. Utilizing various probability-based information retrieval methods on the data, we were able to derive 4 logistic regression models for classification. Furthermore, we experimented with pre-trained transformers to perform similar classification tasks. After evaluating both the macro-F1 and ROC-AUC metrics, we concluded that the Fined-Tuned BERT model exceeds other strategies in performance.

View on arXiv PDF

Similar