LGMLJul 19, 2018

Online Label Aggregation: A Variational Bayesian Approach

arXiv:1807.07291v2
AI Analysis

This work addresses the need for efficient and timely label aggregation in crowdsourcing applications, offering an incremental solution that improves accuracy over existing methods.

The paper tackles the problem of noisy labels in crowdsourced data by proposing an online label aggregation framework called BiLA, which uses variational Bayesian inference and a novel stochastic optimization scheme to incrementally infer true labels, achieving error rate reductions of at least 10% on synthetic and 1.5% on real-world datasets.

Noisy labeled data is more a norm than a rarity for crowd sourced contents. It is effective to distill noise and infer correct labels through aggregation results from crowd workers. To ensure the time relevance and overcome slow responses of workers, online label aggregation is increasingly requested, calling for solutions that can incrementally infer true label distribution via subsets of data items. In this paper, we propose a novel online label aggregation framework, BiLA, which employs variational Bayesian inference method and designs a novel stochastic optimization scheme for incremental training. BiLA is flexible to accommodate any generating distribution of labels by the exact computation of its posterior distribution. We also derive the convergence bound of the proposed optimizer. We compare BiLA with the state of the art based on minimax entropy, neural networks and expectation maximization algorithms, on synthetic and real-world data sets. Our evaluation results on various online scenarios show that BiLA can effectively infer the true labels, with an error rate reduction of at least 10 to 1.5 percent points for synthetic and real-world datasets, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes