Overview of the Wikidata Vandalism Detection Task at WSDM Cup 2017
This addresses the issue of maintaining data quality on Wikidata for users and researchers, though it is incremental as it builds on existing vandalism detection work.
The paper tackled the problem of detecting vandalism on Wikidata by recasting it as an online learning task requiring near real-time predictions, with the best-performing approach achieving a ROC-AUC of 0.947 and a PR-AUC of 0.458.
We report on the Wikidata vandalism detection task at the WSDM Cup 2017. The task received five submissions for which this paper describes their evaluation and a comparison to state of the art baselines. Unlike previous work, we recast Wikidata vandalism detection as an online learning problem, requiring participant software to predict vandalism in near real-time. The best-performing approach achieves a ROC-AUC of 0.947 at a PR-AUC of 0.458. In particular, this task was organized as a software submission task: to maximize reproducibility as well as to foster future research and development on this task, the participants were asked to submit their working software to the TIRA experimentation platform along with the source code for open source release.