CL LGMar 13, 2025

Wikipedia is Not a Dictionary, Delete! Text Classification as a Proxy for Analysing Wiki Deletion Discussions

arXiv:2503.10294v116.311 citationsh-index: 6Proceedings of the Tenth Workshop on Noisy and User-generated Text

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of improving content moderation efficiency for platforms like Wikipedia, but it is incremental as it applies existing methods to new data without introducing novel techniques.

The study tackled the challenge of automated content moderation in collaborative knowledge hubs by constructing a database of deletion discussions from multiple Wikis and languages, and evaluating language models on tasks like predicting discussion outcomes and identifying implicit policies. The results showed that deletion discussions are easier to predict, and self-produced tags often do not aid classifiers due to user hesitation or deliberation.

Automated content moderation for collaborative knowledge hubs like Wikipedia or Wikidata is an important yet challenging task due to multiple factors. In this paper, we construct a database of discussions happening around articles marked for deletion in several Wikis and in three languages, which we then use to evaluate a range of LMs on different tasks (from predicting the outcome of the discussion to identifying the implicit policy an individual comment might be pointing to). Our results reveal, among others, that discussions leading to deletion are easier to predict, and that, surprisingly, self-produced tags (keep, delete or redirect) don't always help guiding the classifiers, presumably because of users' hesitation or deliberation within comments.

View on arXiv PDF

Similar