Mixed data Deep Gaussian Mixture Model: A clustering model for mixed datasets
This addresses clustering challenges for researchers dealing with heterogeneous data types, though it appears incremental as it builds on existing models like Deep Gaussian Mixture Models.
The authors tackled the problem of clustering mixed datasets by introducing the Mixed Deep Gaussian Mixture Model (MDGMM), which automatically merges clustering on continuous and non-continuous data, and validated it by comparing results with state-of-the-art models on several datasets.
Clustering mixed data presents numerous challenges inherent to the very heterogeneous nature of the variables. A clustering algorithm should be able, despite of this heterogeneity, to extract discriminant pieces of information from the variables in order to design groups. In this work we introduce a multilayer architecture model-based clustering method called Mixed Deep Gaussian Mixture Model (MDGMM) that can be viewed as an automatic way to merge the clustering performed separately on continuous and non-continuous data. This architecture is flexible and can be adapted to mixed as well as to continuous or non-continuous data. In this sense we generalize Generalized Linear Latent Variable Models and Deep Gaussian Mixture Models. We also design a new initialisation strategy and a data driven method that selects the best specification of the model and the optimal number of clusters for a given dataset "on the fly". Besides, our model provides continuous low-dimensional representations of the data which can be a useful tool to visualize mixed datasets. Finally, we validate the performance of our approach comparing its results with state-of-the-art mixed data clustering models over several commonly used datasets.