LGOct 17, 2022

Regularized Data Programming with Automated Bayesian Prior Selection

Jacqueline R. M. A. Maasch, Hao Zhang, Qian Yang, Fei Wang, Volodymyr Kuleshov

arXiv:2210.08677v21.8h-index: 134

Originality Incremental advance

AI Analysis

This work addresses a bottleneck in weakly supervised learning for practitioners facing high labeling costs, though it appears incremental as an extension of existing data programming methods.

The paper tackles the problem of data programming's failure to outperform unweighted majority voting in low-data contexts by introducing a Bayesian extension with regularization through informative priors, achieving improved performance, interpretability, and robustness in low-data regimes.

The cost of manual data labeling can be a significant obstacle in supervised learning. Data programming (DP) offers a weakly supervised solution for training dataset creation, wherein the outputs of user-defined programmatic labeling functions (LFs) are reconciled through unsupervised learning. However, DP can fail to outperform an unweighted majority vote in some scenarios, including low-data contexts. This work introduces a Bayesian extension of classical DP that mitigates failures of unsupervised learning by augmenting the DP objective with regularization terms. Regularized learning is achieved through maximum a posteriori estimation with informative priors. Majority vote is proposed as a proxy signal for automated prior parameter selection. Results suggest that regularized DP improves performance relative to maximum likelihood and majority voting, confers greater interpretability, and bolsters performance in low-data regimes.

View on arXiv PDF

Similar