Embracing Error to Enable Rapid Crowdsourcing
This addresses the scalability problem for researchers and practitioners in social science and machine learning who rely on crowdsourced datasets, offering a significant improvement over incremental approaches.
The paper tackles the high cost and slow pace of microtask crowdsourcing by introducing a technique that speeds up workers' judgments by accepting errors, which are then rectified through randomization and modeling of response latency, achieving up to a 10x speedup compared to prior methods.
Microtask crowdsourcing has enabled dataset advances in social science and machine learning, but existing crowdsourcing schemes are too expensive to scale up with the expanding volume of data. To scale and widen the applicability of crowdsourcing, we present a technique that produces extremely rapid judgments for binary and categorical labels. Rather than punishing all errors, which causes workers to proceed slowly and deliberately, our technique speeds up workers' judgments to the point where errors are acceptable and even expected. We demonstrate that it is possible to rectify these errors by randomizing task order and modeling response latency. We evaluate our technique on a breadth of common labeling tasks such as image verification, word similarity, sentiment analysis and topic classification. Where prior work typically achieves a 0.25x to 1x speedup over fixed majority vote, our approach often achieves an order of magnitude (10x) speedup.