MECOMLApr 11, 2014

Model Based Clustering of High-Dimensional Binary Data

arXiv:1404.3174v227 citations
AI Analysis

This work addresses a specific gap in clustering methods for high-dimensional binary data, which is incremental by extending existing Gaussian latent variable approaches with factor analyzers and random effects.

The authors tackled the problem of clustering high-dimensional binary data, for which few methods exist, by proposing a mixture of latent trait models with common slope parameters (MCLT) that enables low-dimensional visualization and handles dependencies through random block effects, achieving efficient parameter estimation via a variational approximation algorithm.

We propose a mixture of latent trait models with common slope parameters (MCLT) for model-based clustering of high-dimensional binary data, a data type for which few established methods exist. Recent work on clustering of binary data, based on a $d$-dimensional Gaussian latent variable, is extended by incorporating common factor analyzers. Accordingly, our approach facilitates a low-dimensional visual representation of the clusters. We extend the model further by the incorporation of random block effects. The dependencies in each block are taken into account through block-specific parameters that are considered to be random variables. A variational approximation to the likelihood is exploited to derive a fast algorithm for determining the model parameters. Our approach is demonstrated on real and simulated data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes