HCITMLOct 5, 2016

Universal Clustering via Crowdsourcing

arXiv:1610.02276v11 citations
Originality Incremental advance
AI Analysis

This addresses the problem of scalable and reliable clustering in crowdsourcing for applications like data labeling, though it is incremental in extending existing models with memory traits.

The paper tackles unsupervised clustering of objects using crowdsourced human responses, without prior knowledge of worker reliability or task difficulty, by modeling worker behavior with memory traits and proving asymptotic consistency and order-optimal sample complexity for the algorithms.

Consider unsupervised clustering of objects drawn from a discrete set, through the use of human intelligence available in crowdsourcing platforms. This paper defines and studies the problem of universal clustering using responses of crowd workers, without knowledge of worker reliability or task difficulty. We model stochastic worker response distributions by incorporating traits of memory for similar objects and traits of distance among differing objects. We are particularly interested in two limiting worker types---temporary workers who retain no memory of responses and long-term workers with memory. We first define clustering algorithms for these limiting cases and then integrate them into an algorithm for the unified worker model. We prove asymptotic consistency of the algorithms and establish sufficient conditions on the sample complexity of the algorithm. Converse arguments establish necessary conditions on sample complexity, proving that the defined algorithms are asymptotically order-optimal in cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes