LG DS STOct 2, 2012

Learning mixtures of structured distributions over discrete domains

Siu-on Chan, Ilias Diakonikolas, Rocco A. Servedio, Xiaorui Sun

arXiv:1210.0864v187 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of mixture learning for structured distributions, which is incremental as it builds on existing structural properties to improve algorithmic efficiency.

The paper tackles the problem of learning mixtures of structured probability distributions over discrete domains by developing a general algorithm that is efficient in running time and sample complexity, achieving near-optimal efficiency for distributions like log-concave, monotone hazard rate, and unimodal types.

Let $\mathfrak{C}$ be a class of probability distributions over the discrete domain $[n] = \{1,...,n\}.$ We show that if $\mathfrak{C}$ satisfies a rather general condition -- essentially, that each distribution in $\mathfrak{C}$ can be well-approximated by a variable-width histogram with few bins -- then there is a highly efficient (both in terms of running time and sample complexity) algorithm that can learn any mixture of $k$ unknown distributions from $\mathfrak{C}.$ We analyze several natural types of distributions over $[n]$, including log-concave, monotone hazard rate and unimodal distributions, and show that they have the required structural property of being well-approximated by a histogram with few bins. Applying our general algorithm, we obtain near-optimally efficient algorithms for all these mixture learning problems.

View on arXiv PDF

Similar