Statistical Latent Space Approach for Mixed Data Modelling and Applications
This addresses the problem of handling mixed data for applications like medical analysis and image retrieval, but it is incremental as it builds on existing RBM models.
The paper tackles the challenge of modeling mixed data by extending the mixed-variate restricted Boltzmann machine with parameter sharing, balancing, structured sparsity, and distance metric learning to transform heterogeneous data into homogeneous representations. The results show improved performance over baselines in medical data and state-of-the-art rivals in image datasets.
The analysis of mixed data has been raising challenges in statistics and machine learning. One of two most prominent challenges is to develop new statistical techniques and methodologies to effectively handle mixed data by making the data less heterogeneous with minimum loss of information. The other challenge is that such methods must be able to apply in large-scale tasks when dealing with huge amount of mixed data. To tackle these challenges, we introduce parameter sharing and balancing extensions to our recent model, the mixed-variate restricted Boltzmann machine (MV.RBM) which can transform heterogeneous data into homogeneous representation. We also integrate structured sparsity and distance metric learning into RBM-based models. Our proposed methods are applied in various applications including latent patient profile modelling in medical data analysis and representation learning for image retrieval. The experimental results demonstrate the models perform better than baseline methods in medical data and outperform state-of-the-art rivals in image dataset.