Predictive Inference with Weak Supervision
This work addresses the challenge of model validation in machine learning when only partial or weak labels are available, offering a practical solution for scenarios with expensive labeling.
The paper tackles the problem of providing valid predictive confidence sets using weakly labeled data, introducing a new coverage definition that yields tighter and more informative confidence sets, as demonstrated through experiments.
The expense of acquiring labels in large-scale statistical machine learning makes partially and weakly-labeled data attractive, though it is not always apparent how to leverage such data for model fitting or validation. We present a methodology to bridge the gap between partial supervision and validation, developing a conformal prediction framework to provide valid predictive confidence sets -- sets that cover a true label with a prescribed probability, independent of the underlying distribution -- using weakly labeled data. To do so, we introduce a (necessary) new notion of coverage and predictive validity, then develop several application scenarios, providing efficient algorithms for classification and several large-scale structured prediction problems. We corroborate the hypothesis that the new coverage definition allows for tighter and more informative (but valid) confidence sets through several experiments.