MLLGAPCOFeb 4, 2021

Asymptotically Exact and Fast Gaussian Copula Models for Imputation of Mixed Data Types

arXiv:2102.02642v2
AI Analysis

This work provides a more accurate and versatile imputation method for machine learning practitioners and researchers dealing with mixed data types, particularly in surveys and medical applications, representing an incremental improvement over existing Gaussian copula models.

This paper addresses the problem of missing values in mixed data types by improving Gaussian copula models. It introduces a more precise approximation for model estimation and imputation using randomized quasi-Monte Carlo procedures, leading to lower errors in estimated parameters and imputed values. The method also extends support to include unordered multinomial variables.

Missing values with mixed data types is a common problem in a large number of machine learning applications such as processing of surveys and in different medical applications. Recently, Gaussian copula models have been suggested as a means of performing imputation of missing values using a probabilistic framework. While the present Gaussian copula models have shown to yield state of the art performance, they have two limitations: they are based on an approximation that is fast but may be imprecise and they do not support unordered multinomial variables. We address the first limitation using direct and arbitrarily precise approximations both for model estimation and imputation by using randomized quasi-Monte Carlo procedures. The method we provide has lower errors for the estimated model parameters and the imputed values, compared to previously proposed methods. We also extend the previous Gaussian copula models to include unordered multinomial variables in addition to the present support of ordinal, binary, and continuous variables.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes