Inconsistency of Pitman-Yor process mixtures for the number of components
This reveals a critical limitation for researchers using nonparametric Bayesian mixtures to infer the number of components in data, indicating that such methods are unreliable for this purpose.
The paper demonstrates that the posterior distribution on the number of components in Pitman-Yor process mixtures (including Dirichlet process mixtures) is inconsistent, failing to concentrate at the true number of components when data come from a finite mixture model, across a wide range of component distributions like discrete families and multivariate Gaussians.
In many applications, a finite mixture is a natural model, but it can be difficult to choose an appropriate number of components. To circumvent this choice, investigators are increasingly turning to Dirichlet process mixtures (DPMs), and Pitman-Yor process mixtures (PYMs), more generally. While these models may be well-suited for Bayesian density estimation, many investigators are using them for inferences about the number of components, by considering the posterior on the number of components represented in the observed data. We show that this posterior is not consistent --- that is, on data from a finite mixture, it does not concentrate at the true number of components. This result applies to a large class of nonparametric mixtures, including DPMs and PYMs, over a wide variety of families of component distributions, including essentially all discrete families, as well as continuous exponential families satisfying mild regularity conditions (such as multivariate Gaussians).