MESTMLFeb 21, 2019

An information criterion for auxiliary variable selection in incomplete data analysis

arXiv:1902.07954v24 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of variable selection for statisticians and data analysts dealing with incomplete datasets, offering a method to enhance estimation accuracy when auxiliary variables are relevant.

The paper tackles the problem of selecting auxiliary variables to improve estimation of primary variables in incomplete data analysis, proposing an information criterion that asymptotically estimates the Kullback-Leibler divergence and shows equivalence to cross-validation, with performance validated through simulation and real data.

Statistical inference is considered for variables of interest, called primary variables, when auxiliary variables are observed along with the primary variables. We consider the setting of incomplete data analysis, where some primary variables are not observed. Utilizing a parametric model of joint distribution of primary and auxiliary variables, it is possible to improve the estimation of parametric model for the primary variables when the auxiliary variables are closely related to the primary variables. However, the estimation accuracy reduces when the auxiliary variables are irrelevant to the primary variables. For selecting useful auxiliary variables, we formulate the problem as model selection, and propose an information criterion for predicting primary variables by leveraging auxiliary variables. The proposed information criterion is an asymptotically unbiased estimator of the Kullback-Leibler divergence for complete data of primary variables under some reasonable conditions. We also clarify an asymptotic equivalence between the proposed information criterion and a variant of leave-one-out cross validation. Performance of our method is demonstrated via a simulation study and a real data example.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes