LGAIMLJul 10, 2018

Handling Incomplete Heterogeneous Data using VAEs

arXiv:1807.03653v4443 citations
Originality Incremental advance
AI Analysis

This addresses a common real-world data challenge in fields like healthcare or finance, but it is incremental as it builds on existing VAE methods.

The authors tackled the problem of handling incomplete heterogeneous data (mixed continuous and discrete with missing values) by proposing HI-VAE, a general VAE framework that supports various data types and improves predictive performance, outperforming supervised models on incomplete data.

Variational autoencoders (VAEs), as well as other generative models, have been shown to be efficient and accurate for capturing the latent structure of vast amounts of complex high-dimensional data. However, existing VAEs can still not directly handle data that are heterogenous (mixed continuous and discrete) or incomplete (with missing data at random), which is indeed common in real-world applications. In this paper, we propose a general framework to design VAEs suitable for fitting incomplete heterogenous data. The proposed HI-VAE includes likelihood models for real-valued, positive real valued, interval, categorical, ordinal and count data, and allows accurate estimation (and potentially imputation) of missing data. Furthermore, HI-VAE presents competitive predictive performance in supervised tasks, outperforming supervised models when trained on incomplete data.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes