LGAIMLMar 5, 2019

What to Expect of Classifiers? Reasoning about Logistic Regression with Missing Features

arXiv:1903.01620v245 citations
Originality Incremental advance
AI Analysis

This addresses a practical challenge in machine learning for users of discriminative classifiers, offering an incremental improvement over existing methods for handling missing data.

The paper tackles the problem of missing feature values at prediction time for classifiers, proposing a framework that computes expected predictions using a feature distribution, and shows it matches logistic regression performance with all features observed and outperforms standard imputation techniques when features are missing.

While discriminative classifiers often yield strong predictive performance, missing feature values at prediction time can still be a challenge. Classifiers may not behave as expected under certain ways of substituting the missing values, since they inherently make assumptions about the data distribution they were trained on. In this paper, we propose a novel framework that classifies examples with missing features by computing the expected prediction with respect to a feature distribution. Moreover, we use geometric programming to learn a naive Bayes distribution that embeds a given logistic regression classifier and can efficiently take its expected predictions. Empirical evaluations show that our model achieves the same performance as the logistic regression with all features observed, and outperforms standard imputation techniques when features go missing during prediction time. Furthermore, we demonstrate that our method can be used to generate "sufficient explanations" of logistic regression classifications, by removing features that do not affect the classification.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes