Building automated vandalism detection tools for Wikidata
This addresses the issue of maintaining quality in open, structured knowledge bases like Wikidata, though it is incremental as it builds on prior Wikipedia vandalism detection methods.
The paper tackles the problem of vandalism in Wikidata by developing automated detection tools, achieving 89% vandalism detection while reducing patrollers' workload by 98%.
Wikidata, like Wikipedia, is a knowledge base that anyone can edit. This open collaboration model is powerful in that it reduces barriers to participation and allows a large number of people to contribute. However, it exposes the knowledge base to the risk of vandalism and low-quality contributions. In this work, we build on past work detecting vandalism in Wikipedia to detect vandalism in Wikidata. This work is novel in that identifying damaging changes in a structured knowledge-base requires substantially different feature engineering work than in a text-based wiki like Wikipedia. We also discuss the utility of these classifiers for reducing the overall workload of vandalism patrollers in Wikidata. We describe a machine classification strategy that is able to catch 89% of vandalism while reducing patrollers' workload by 98%, by drawing lightly from contextual features of an edit and heavily from the characteristics of the user making the edit.