LGMLNov 28, 2020

Learning from Incomplete Features by Simultaneous Training of Neural Networks and Sparse Coding

arXiv:2011.14047v2
AI Analysis

This work provides a method for training classifiers on incomplete datasets, which is a common problem in real-world applications where not all features are collected for every sample, offering an incremental improvement over existing imputation methods.

This paper addresses the problem of training classifiers on datasets with incomplete features, where only a subset of features is available per data instance. The authors developed a supervised learning method that simultaneously trains a classifier (e.g., logistic regression or deep neural network) and learns sparse representations of data, demonstrating its effectiveness compared to traditional imputation and a state-of-the-art algorithm.

In this paper, the problem of training a classifier on a dataset with incomplete features is addressed. We assume that different subsets of features (random or structured) are available at each data instance. This situation typically occurs in the applications when not all the features are collected for every data sample. A new supervised learning method is developed to train a general classifier, such as a logistic regression or a deep neural network, using only a subset of features per sample, while assuming sparse representations of data vectors on an unknown dictionary. Sufficient conditions are identified, such that, if it is possible to train a classifier on incomplete observations so that their reconstructions are well separated by a hyperplane, then the same classifier also correctly separates the original (unobserved) data samples. Extensive simulation results on synthetic and well-known datasets are presented that validate our theoretical findings and demonstrate the effectiveness of the proposed method compared to traditional data imputation approaches and one state-of-the-art algorithm.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes