LG ITMay 29, 2025

Refining Labeling Functions with Limited Labeled Data

Chenjie Li, Amir Gilad, Boris Glavic, Zhengjie Miao, Sudeepa Roy

arXiv:2505.23470v21 citationsh-index: 24KDD

Originality Incremental advance

AI Analysis

This work addresses the challenge of enhancing weak supervision efficiency for practitioners by refining labeling functions with minimal labeled data, representing an incremental improvement in the field.

The paper tackles the problem of improving labeling functions in programmatic weak supervision using limited labeled data, developing techniques to repair them by minimally changing outputs on labeled examples, and demonstrates experimentally that this improves LF quality with small labeled datasets.

Programmatic weak supervision (PWS) significantly reduces human effort for labeling data by combining the outputs of user-provided labeling functions (LFs) on unlabeled datapoints. However, the quality of the generated labels depends directly on the accuracy of the LFs. In this work, we study the problem of fixing LFs based on a small set of labeled examples. Towards this goal, we develop novel techniques for repairing a set of LFs by minimally changing their results on the labeled examples such that the fixed LFs ensure that (i) there is sufficient evidence for the correct label of each labeled datapoint and (ii) the accuracy of each repaired LF is sufficiently high. We model LFs as conditional rules which enables us to refine them, i.e., to selectively change their output for some inputs. We demonstrate experimentally that our system improves the quality of LFs based on surprisingly small sets of labeled datapoints.

View on arXiv PDF

Similar