QM LG NC APSep 11, 2013

High-dimensional cluster analysis with the Masked EM Algorithm

Shabnam N. Kadir, Dan F. M. Goodman, Kenneth D. Harris

arXiv:1309.2848v1304 citations

Originality Incremental advance

AI Analysis

This addresses the curse of dimensionality and computational inefficiency in clustering for applications like neuroscience, though it appears incremental as it builds on existing mixture of Gaussians models.

The paper tackled the problem of high-dimensional cluster analysis by introducing the Masked EM algorithm to handle cases where only a subset of features is informative for each data point, showing it performs close to optimally on simulated Gaussian data and in spike sorting applications.

Cluster analysis faces two problems in high dimensions: first, the `curse of dimensionality' that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. In many applications, only a small subset of features provide information about the cluster membership of any one data point, however this informative feature subset may not be the same for all data points. Here we introduce a `Masked EM' algorithm for fitting mixture of Gaussians models in such cases. We show that the algorithm performs close to optimally on simulated Gaussian data, and in an application of `spike sorting' of high channel-count neuronal recordings.

View on arXiv PDF

Similar