NELGMLDec 19, 2013

Missing Value Imputation With Unsupervised Backpropagation

arXiv:1312.5394v118 citations
Originality Incremental advance
AI Analysis

This addresses data quality issues for data mining and analysis applications, representing an incremental improvement over existing imputation methods.

The paper tackles the problem of missing values in datasets by introducing Unsupervised Backpropagation (UBP), a technique that trains a multi-layer perceptron to fit data manifolds, resulting in significantly lower sum-squared error for imputation and higher classification accuracy across 24 datasets and 9 algorithms.

Many data mining and data analysis techniques operate on dense matrices or complete tables of data. Real-world data sets, however, often contain unknown values. Even many classification algorithms that are designed to operate with missing values still exhibit deteriorated accuracy. One approach to handling missing values is to fill in (impute) the missing values. In this paper, we present a technique for unsupervised learning called Unsupervised Backpropagation (UBP), which trains a multi-layer perceptron to fit to the manifold sampled by a set of observed point-vectors. We evaluate UBP with the task of imputing missing values in datasets, and show that UBP is able to predict missing values with significantly lower sum-squared error than other collaborative filtering and imputation techniques. We also demonstrate with 24 datasets and 9 supervised learning algorithms that classification accuracy is usually higher when randomly-withheld values are imputed using UBP, rather than with other methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes