On The Identifiability of Mixture Models from Grouped Samples
This addresses a fundamental identifiability issue in statistics and machine learning for mixture models, providing theoretical guarantees for non-parametric estimation with grouped data.
The paper tackles the problem of identifying mixture models without assumptions on the mixture components, using grouped samples where observations in the same group come from the same component. It shows that any mixture of m probability measures can be uniquely identified with 2m-1 observations per group, and proves that 2m-2 observations are insufficient for some mixtures.
Finite mixture models are statistical models which appear in many problems in statistics and machine learning. In such models it is assumed that data are drawn from random probability measures, called mixture components, which are themselves drawn from a probability measure P over probability measures. When estimating mixture models, it is common to make assumptions on the mixture components, such as parametric assumptions. In this paper, we make no assumption on the mixture components, and instead assume that observations from the mixture model are grouped, such that observations in the same group are known to be drawn from the same component. We show that any mixture of m probability measures can be uniquely identified provided there are 2m-1 observations per group. Moreover we show that, for any m, there exists a mixture of m probability measures that cannot be uniquely identified when groups have 2m-2 observations. Our results hold for any sample space with more than one element.