MLLGSep 10, 2020

Generalized Multi-Output Gaussian Process Censored Regression

arXiv:2009.04822v219 citations
AI Analysis

This addresses bias in censored data modeling for applications like survival analysis or econometrics, but it is incremental as it builds on existing Gaussian process and multi-output methods.

The paper tackled the problem of bias in censored data regression by introducing a heteroscedastic multi-output Gaussian process model that leverages correlations between outputs, showing improved estimation of the true underlying process in synthetic and real-world tasks.

When modelling censored observations, a typical approach in current regression methods is to use a censored-Gaussian (i.e. Tobit) model to describe the conditional output distribution. In this paper, as in the case of missing data, we argue that exploiting correlations between multiple outputs can enable models to better address the bias introduced by censored data. To do so, we introduce a heteroscedastic multi-output Gaussian process model which combines the non-parametric flexibility of GPs with the ability to leverage information from correlated outputs under input-dependent noise conditions. To address the resulting inference intractability, we further devise a variational bound to the marginal log-likelihood suitable for stochastic optimization. We empirically evaluate our model against other generative models for censored data on both synthetic and real world tasks and further show how it can be generalized to deal with arbitrary likelihood functions. Results show how the added flexibility allows our model to better estimate the underlying non-censored (i.e. true) process under potentially complex censoring dynamics.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes