ME CO MLApr 11, 2014

Model Based Clustering of High-Dimensional Binary Data

Yang Tang, Ryan P. Browne, Paul D. McNicholas

arXiv:1404.3174v227 citations

AI Analysis

This work addresses a specific gap in clustering methods for high-dimensional binary data, which is incremental by extending existing Gaussian latent variable approaches with factor analyzers and random effects.

The authors tackled the problem of clustering high-dimensional binary data, for which few methods exist, by proposing a mixture of latent trait models with common slope parameters (MCLT) that enables low-dimensional visualization and handles dependencies through random block effects, achieving efficient parameter estimation via a variational approximation algorithm.

We propose a mixture of latent trait models with common slope parameters (MCLT) for model-based clustering of high-dimensional binary data, a data type for which few established methods exist. Recent work on clustering of binary data, based on a $d$-dimensional Gaussian latent variable, is extended by incorporating common factor analyzers. Accordingly, our approach facilitates a low-dimensional visual representation of the clusters. We extend the model further by the incorporation of random block effects. The dependencies in each block are taken into account through block-specific parameters that are considered to be random variables. A variational approximation to the likelihood is exploited to derive a fast algorithm for determining the model parameters. Our approach is demonstrated on real and simulated data.

View on arXiv PDF

Similar