ME AI LGJun 27, 2012

The AI&M Procedure for Learning from Incomplete Data

arXiv:1206.6830v118 citations

AI Analysis

This work addresses a specific problem in statistical learning for researchers dealing with incomplete data, but it is incremental as it builds on existing methods like EM.

The authors tackled parameter learning from incomplete data with non-random missingness by introducing the AI&M procedure, which optimizes the profile likelihood in the data completion space rather than the parameter space, showing that likelihood-based inference remains feasible and conservative inference is weaker than necessary.

We investigate methods for parameter learning from incomplete data that is not missing at random. Likelihood-based methods then require the optimization of a profile likelihood that takes all possible missingness mechanisms into account. Optimzing this profile likelihood poses two main difficulties: multiple (local) maxima, and its very high-dimensional parameter space. In this paper a new method is presented for optimizing the profile likelihood that addresses the second difficulty: in the proposed AI&M (adjusting imputation and mazimization) procedure the optimization is performed by operations in the space of data completions, rather than directly in the parameter space of the profile likelihood. We apply the AI&M method to learning parameters for Bayesian networks. The method is compared against conservative inference, which takes into account each possible data completion, and against EM. The results indicate that likelihood-based inference is still feasible in the case of unknown missingness mechanisms, and that conservative inference is unnecessarily weak. On the other hand, our results also provide evidence that the EM algorithm is still quite effective when the data is not missing at random.

View on arXiv PDF

Similar