MMM: Clustering Multivariate Longitudinal Mixed-type Data
This work addresses a scarcity of algorithms for clustering mixed-type longitudinal data, which is incremental as it builds on existing mixture models to handle more complex data structures.
The authors tackled the problem of clustering multivariate longitudinal mixed-type data, which is challenging due to the need to model within- and between-time dependencies, by introducing the MMM model that handles various data types and achieves clustering in a latent dimension with demonstrated inference abilities on synthetic and financial data.
Multivariate longitudinal data of mixed-type are increasingly collected in many science domains. However, algorithms to cluster this kind of data remain scarce, due to the challenge to simultaneously model the within- and between-time dependence structures for multivariate data of mixed kind. We introduce the Mixture of Mixed-Matrices (MMM) model: reorganizing the data in a three-way structure and assuming that the non-continuous variables are observations of underlying latent continuous variables, the model relies on a mixture of matrix-variate normal distributions to perform clustering in the latent dimension. The MMM model is thus able to handle continuous, ordinal, binary, nominal and count data and to concurrently model the heterogeneity, the association among the responses and the temporal dependence structure in a parsimonious way and without assuming conditional independence. The inference is carried out through an MCMC-EM algorithm, which is detailed. An evaluation of the model through synthetic data shows its inference abilities. A real-world application on financial data is presented.