DBHCApr 7, 2017

PreCog: Improving Crowdsourced Data Quality Before Acquisition

arXiv:1704.02384v13 citations
Originality Incremental advance
AI Analysis

This addresses the issue of cost and latency in crowdsourced data acquisition for applications requiring high-quality data, representing an incremental improvement over existing post-hoc methods.

The paper tackles the problem of improving data quality in crowdsourcing by proposing pre-hoc interface optimizations, specifically the Precog system with a Segment-Predict-Explain pattern, which collects 2x more high-quality text data than non-Precog approaches in two real domains.

Quality control in crowdsourcing systems is crucial. It is typically done after data collection, often using additional crowdsourced tasks to assess and improve the quality. These post-hoc methods can easily add cost and latency to the acquisition process--particularly if collecting high-quality data is important. In this paper, we argue for pre-hoc interface optimizations based on feedback that helps workers improve data quality before it is submitted and is well suited to complement post-hoc techniques. We propose the Precog system that explicitly supports such interface optimizations for common integrity constraints as well as more ambiguous text acquisition tasks where quality is ill-defined. We then develop the Segment-Predict-Explain pattern for detecting low-quality text segments and generating prescriptive explanations to help the worker improve their text input. Our unique combination of segmentation and prescriptive explanation are necessary for Precog to collect 2x more high-quality text data than non-Precog approaches on two real domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes