Flexible Clustering with a Sparse Mixture of Generalized Hyperbolic Distributions
This addresses clustering challenges in high-dimensional real-world data, but it is incremental as it builds on existing mixture models with a new parameterization.
The authors tackled robust clustering of high-dimensional data with heavy-tailed or asymmetric clusters by proposing a sparse mixture of generalized hyperbolic distributions with a penalty term, developing an expectation-maximization algorithm, and validating it through simulations and real datasets.
Robust clustering of high-dimensional data is an important topic because clusters in real datasets are often heavy-tailed and/or asymmetric. Traditional approaches to model-based clustering often fail for high dimensional data, e.g., due to the number of free covariance parameters. A parametrization of the component scale matrices for the mixture of generalized hyperbolic distributions is proposed. This parameterization includes a penalty term in the likelihood. An analytically feasible expectation-maximization algorithm is developed by placing a gamma-lasso penalty constraining the concentration matrix. The proposed methodology is investigated through simulation studies and illustrated using two real datasets.