MLSep 30, 2017

Decontamination of Mutual Contamination Models

Julian Katz-Samuels, Gilles Blanchard, Clayton Scott

arXiv:1710.01167v28.125 citations

Originality Highly original

AI Analysis

This work addresses a foundational issue in machine learning for applications like noisy classification and data demixing, offering theoretical and algorithmic solutions.

The paper tackles the problem of inferring unknown base distributions from convex combinations in mutual contamination models, providing identifiability conditions and algorithms with performance guarantees for multiclass classification with label noise, demixing of mixed membership models, and classification with partial labels.

Many machine learning problems can be characterized by mutual contamination models. In these problems, one observes several random samples from different convex combinations of a set of unknown base distributions and the goal is to infer these base distributions. This paper considers the general setting where the base distributions are defined on arbitrary probability spaces. We examine three popular machine learning problems that arise in this general setting: multiclass classification with label noise, demixing of mixed membership models, and classification with partial labels. In each case, we give sufficient conditions for identifiability and present algorithms for the infinite and finite sample settings, with associated performance guarantees.

View on arXiv PDF

Similar