LG AI MLFeb 5, 2020

Exploratory Machine Learning with Unknown Unknowns

Peng Zhao, Jia-Wei Shan, Yu-Jie Zhang, Zhi-Hua Zhou

arXiv:2002.01605v25.830 citations

Originality Highly original

AI Analysis

This addresses a critical issue in supervised learning where incomplete label perception can lead to misclassification, offering a novel approach for improving model robustness in domains with hidden classes.

The paper tackles the problem of unknown classes in training data mislabeled as known classes, proposing exploratory machine learning to actively augment the feature space and discover hidden classes, with validation on synthetic and real datasets.

In conventional supervised learning, a training dataset is given with ground-truth labels from a known label set, and the learned model will classify unseen instances to known labels. This paper studies a new problem setting in which there are unknown classes in the training data misperceived as other labels, and thus their existence appears unknown from the given supervision. We attribute the unknown unknowns to the fact that the training dataset is badly advised by the incompletely perceived label space due to the insufficient feature information. To this end, we propose the exploratory machine learning, which examines and investigates training data by actively augmenting the feature space to discover potentially hidden classes. Our method consists of three ingredients including rejection model, feature exploration, and model cascade. We provide theoretical analysis to justify its superiority, and validate the effectiveness on both synthetic and real datasets.

View on arXiv PDF

Similar