CLCRCYOct 24, 2013

Sockpuppet Detection in Wikipedia: A Corpus of Real-World Deceptive Writing for Linking Identities

arXiv:1310.6772v140 citations
Originality Synthesis-oriented
AI Analysis

This addresses the issue of online identity deception for Wikipedia administrators and researchers, but it is incremental as it focuses on data collection rather than novel detection methods.

The paper tackles the problem of detecting sockpuppet accounts in Wikipedia by creating the first corpus of real-world deceptive writing, resulting in a dataset of real sockpuppet investigation cases that can be used for benchmarking research.

This paper describes the corpus of sockpuppet cases we gathered from Wikipedia. A sockpuppet is an online user account created with a fake identity for the purpose of covering abusive behavior and/or subverting the editing regulation process. We used a semi-automated method for crawling and curating a dataset of real sockpuppet investigation cases. To the best of our knowledge, this is the first corpus available on real-world deceptive writing. We describe the process for crawling the data and some preliminary results that can be used as baseline for benchmarking research. The dataset will be released under a Creative Commons license from our project website: http://docsig.cis.uab.edu.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes