MLLGMEJun 23, 2020

not-MIWAE: Deep Generative Modelling with Missing not at Random Data

arXiv:2006.12871v282 citations
AI Analysis

This addresses a critical issue in machine learning for domains with self-censoring or similar MNAR mechanisms, though it is an incremental improvement over existing methods for handling missing data.

The paper tackles the problem of missing not at random (MNAR) data in deep generative models by explicitly modeling the missing process with a deep neural network, resulting in improved likelihood-based inference across various datasets and missingness patterns.

When a missing process depends on the missing values themselves, it needs to be explicitly modelled and taken into account while doing likelihood-based inference. We present an approach for building and fitting deep latent variable models (DLVMs) in cases where the missing process is dependent on the missing data. Specifically, a deep neural network enables us to flexibly model the conditional distribution of the missingness pattern given the data. This allows for incorporating prior information about the type of missingness (e.g. self-censoring) into the model. Our inference technique, based on importance-weighted variational inference, involves maximising a lower bound of the joint likelihood. Stochastic gradients of the bound are obtained by using the reparameterisation trick both in latent space and data space. We show on various kinds of data sets and missingness patterns that explicitly modelling the missing process can be invaluable.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes