Accurate inference of crowdsourcing properties when using efficient allocation strategies
This work addresses the problem of biased inference in crowdsourcing for researchers and practitioners, offering a method to extract more knowledge from non-representative datasets, though it is incremental as it builds on existing allocation strategies.
The paper tackles the bias introduced by efficient allocation strategies in crowdsourcing, which challenges inference of problem-wide properties like task difficulty and worker completion times, and introduces Decision-Explicit Probability Sampling (DEPS) to accurately infer these properties while maintaining efficiency gains, outperforming baseline methods in experiments on real and synthetic data.
Allocation strategies improve the efficiency of crowdsourcing by decreasing the work needed to complete individual tasks accurately. However, these algorithms introduce bias by preferentially allocating workers onto easy tasks, leading to sets of completed tasks that are no longer representative of all tasks. This bias challenges inference of problem-wide properties such as typical task difficulty or crowd properties such as worker completion times, important information that goes beyond the crowd responses themselves. Here we study inference about problem properties when using an allocation algorithm to improve crowd efficiency. We introduce Decision-Explicit Probability Sampling (DEPS), a novel method to perform inference of problem properties while accounting for the potential bias introduced by an allocation strategy. Experiments on real and synthetic crowdsourcing data show that DEPS outperforms baseline inference methods while still leveraging the efficiency gains of the allocation method. The ability to perform accurate inference of general properties when using non-representative data allows crowdsourcers to extract more knowledge out of a given crowdsourced dataset.