Mixed-Variate Restricted Boltzmann Machines
This work addresses the challenge of handling complex, mixed-type data for applications in data analysis and machine learning, representing an incremental advancement over standard RBMs.
The paper tackles the problem of modeling heterogeneous datasets with multiple variable types and modalities by introducing Mixed-Variate Restricted Boltzmann Machines, which achieve effective performance on tasks like feature extraction, data completion, and prediction using a world opinion survey dataset.
Modern datasets are becoming heterogeneous. To this end, we present in this paper Mixed-Variate Restricted Boltzmann Machines for simultaneously modelling variables of multiple types and modalities, including binary and continuous responses, categorical options, multicategorical choices, ordinal assessment and category-ranked preferences. Dependency among variables is modeled using latent binary variables, each of which can be interpreted as a particular hidden aspect of the data. The proposed model, similar to the standard RBMs, allows fast evaluation of the posterior for the latent variables. Hence, it is naturally suitable for many common tasks including, but not limited to, (a) as a pre-processing step to convert complex input data into a more convenient vectorial representation through the latent posteriors, thereby offering a dimensionality reduction capacity, (b) as a classifier supporting binary, multiclass, multilabel, and label-ranking outputs, or a regression tool for continuous outputs and (c) as a data completion tool for multimodal and heterogeneous data. We evaluate the proposed model on a large-scale dataset using the world opinion survey results on three tasks: feature extraction and visualization, data completion and prediction.