MECOMLFeb 26, 2014

Asymmetric Clusters and Outliers: Mixtures of Multivariate Contaminated Shifted Asymmetric Laplace Distributions

arXiv:1402.6744v238 citations
AI Analysis

This work addresses the challenge of robust clustering in the presence of asymmetric clusters and outliers for data analysis applications, representing an incremental improvement over existing methods like trimming by adding flexibility and automatic outlier detection.

The paper tackles the problem of clustering data with asymmetric clusters and outliers by developing a mixture model based on multivariate contaminated shifted asymmetric Laplace distributions, which automatically detects outliers and estimates contamination parameters without prior specification, showing improved performance on both artificial and real data compared to established finite mixture models.

Mixtures of multivariate contaminated shifted asymmetric Laplace distributions are developed for handling asymmetric clusters in the presence of outliers (also referred to as bad points herein). In addition to the parameters of the related non-contaminated mixture, for each (asymmetric) cluster, our model has one parameter controlling the proportion of outliers and one specifying the degree of contamination. Crucially, these parameters do not have to be specified a priori, adding a flexibility to our approach that is absent from other approaches such as trimming. Moreover, each observation is given a posterior probability of belonging to a particular cluster, and of being an outlier or not; advantageously, this allows for the automatic detection of outliers. An expectation-conditional maximization algorithm is outlined for parameter estimation and various implementation issues are discussed. The behaviour of the proposed model is investigated, and compared with well-established finite mixtures, on artificial and real data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes