ML LGAug 5, 2025

Reliable Programmatic Weak Supervision with Confidence Intervals for Label Probabilities

Verónica Álvarez, Santiago Mazuelas, Steven An, Sanjoy Dasgupta

arXiv:2508.03896v11 citationsh-index: 2IEEE Trans Pattern Anal Mach Intell

Originality Incremental advance

AI Analysis

This work addresses the challenge of assessing reliability in weak supervision for machine learning practitioners, offering a practical tool for dataset labeling, though it is incremental as it builds on existing weak supervision techniques.

The paper tackles the problem of unreliable probabilistic label predictions in programmatic weak supervision by introducing a method that provides confidence intervals for label probabilities, resulting in more reliable predictions and showing improvement over state-of-the-art methods on multiple benchmark datasets.

The accurate labeling of datasets is often both costly and time-consuming. Given an unlabeled dataset, programmatic weak supervision obtains probabilistic predictions for the labels by leveraging multiple weak labeling functions (LFs) that provide rough guesses for labels. Weak LFs commonly provide guesses with assorted types and unknown interdependences that can result in unreliable predictions. Furthermore, existing techniques for programmatic weak supervision cannot provide assessments for the reliability of the probabilistic predictions for labels. This paper presents a methodology for programmatic weak supervision that can provide confidence intervals for label probabilities and obtain more reliable predictions. In particular, the methods proposed use uncertainty sets of distributions that encapsulate the information provided by LFs with unrestricted behavior and typology. Experiments on multiple benchmark datasets show the improvement of the presented methods over the state-of-the-art and the practicality of the confidence intervals presented.

View on arXiv PDF

Similar