A clustering approach to infer Wikipedia contributors' profile
This work addresses the need for simpler, replicable methods to profile online community contributors, though it is incremental by applying clustering to existing data.
The paper tackled the problem of identifying Wikipedia contributor profiles using only edit data and temporal distribution, achieving good accuracy and stability across languages, validated on Romanian and Danish wikis.
In online communities, recent studies have strongly improved our knowledge about the different types or profiles of contributors, from casual to very involved ones, through focused people. However they do so by using very complex methodologies (qualitative-quantitative mix, with a high workload to manually codify/characterize the edits), making their replication for the practitioners limited. These studies are on the English Wikipedia only. The objective of this paper is to highlight different profiles of contributors with clustering techniques. The originality is to show how using only the edits, and their distribution over time, allows to build these contributors profiles with a good accuracy and stability amongst languages. The methodology is validated with both Romanian and Danish wikis. The highlighted profiles are identifiable early in the history of involvement, suggesting that light monitoring of newcomers may be sufficient to adapt the interaction with them and increase the retention rate.