MLLGMar 25, 2021

Prediction in the presence of response-dependent missing labels

arXiv:2103.13555v1
Originality Incremental advance
AI Analysis

This addresses the challenge of response-dependent missing labels in domains like environmental monitoring, where sensing limitations lead to biased training data, though it is an incremental improvement over existing positive-unlabeled learning techniques.

The paper tackles the problem of missing labels in training data where the likelihood of missingness depends on the response, such as undetected small forest fires, by developing a new method that jointly estimates occurrence and detection likelihood. The method outperforms existing state-of-the-art approaches on synthetic and real wildfire datasets.

In a variety of settings, limitations of sensing technologies or other sampling mechanisms result in missing labels, where the likelihood of a missing label in the training set is an unknown function of the data. For example, satellites used to detect forest fires cannot sense fires below a certain size threshold. In such cases, training datasets consist of positive and pseudo-negative observations where pseudo-negative observations can be either true negatives or undetected positives with small magnitudes. We develop a new methodology and non-convex algorithm P(ositive) U(nlabeled) - O(ccurrence) M(agnitude) M(ixture) which jointly estimates the occurrence and detection likelihood of positive samples, utilizing prior knowledge of the detection mechanism. Our approach uses ideas from positive-unlabeled (PU)-learning and zero-inflated models that jointly estimate the magnitude and occurrence of events. We provide conditions under which our model is identifiable and prove that even though our approach leads to a non-convex objective, any local minimizer has optimal statistical error (up to a log term) and projected gradient descent has geometric convergence rates. We demonstrate on both synthetic data and a California wildfire dataset that our method out-performs existing state-of-the-art approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes