LG MLJun 21, 2020

VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data

Chao Ma, Sebastian Tschiatschek, José Miguel Hernández-Lobato, Richard Turner, Cheng Zhang

arXiv:2006.11941v122.695 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of applying deep generative models to real-world heterogeneous datasets, which is an incremental improvement for data science and machine learning applications.

The authors tackled the problem of deep generative models performing poorly on heterogeneous mixed-type data by proposing VAEM, a two-stage variational autoencoder extension that handles different feature types and distributions, demonstrating its effectiveness in data generation, missing data prediction, and feature selection tasks.

Deep generative models often perform poorly in real-world applications due to the heterogeneity of natural data sets. Heterogeneity arises from data containing different types of features (categorical, ordinal, continuous, etc.) and features of the same type having different marginal distributions. We propose an extension of variational autoencoders (VAEs) called VAEM to handle such heterogeneous data. VAEM is a deep generative model that is trained in a two stage manner such that the first stage provides a more uniform representation of the data to the second stage, thereby sidestepping the problems caused by heterogeneous data. We provide extensions of VAEM to handle partially observed data, and demonstrate its performance in data generation, missing data prediction and sequential feature selection tasks. Our results show that VAEM broadens the range of real-world applications where deep generative models can be successfully deployed.

View on arXiv PDF Code

Similar