MLLGNov 25, 2017

Feature Selection Facilitates Learning Mixtures of Discrete Product Distributions

arXiv:1711.09195v1
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving robustness in crowdsourcing and similar tasks by selecting reliable features, though it is incremental as it builds on existing statistical methods.

The paper tackles the problem of learning mixtures of discrete product distributions, such as in crowdsourcing, by using feature selection to eliminate less reliable workers, resulting in substantial improvements on real datasets.

Feature selection can facilitate the learning of mixtures of discrete random variables as they arise, e.g. in crowdsourcing tasks. Intuitively, not all workers are equally reliable but, if the less reliable ones could be eliminated, then learning should be more robust. By analogy with Gaussian mixture models, we seek a low-order statistical approach, and here introduce an algorithm based on the (pairwise) mutual information. This induces an order over workers that is well structured for the `one coin' model. More generally, it is justified by a goodness-of-fit measure and is validated empirically. Improvement in real data sets can be substantial.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes