LGAICVMLMar 22, 2022

Generative Modeling Helps Weak Supervision (and Vice Versa)

arXiv:2203.12023v62 citationsh-index: 94
Originality Highly original
AI Analysis

This work addresses the data labeling bottleneck for machine learning practitioners by combining weak supervision and generative modeling in a novel way.

The authors tackled the problem of limited labeled data by fusing programmatic weak supervision with generative adversarial networks, resulting in improved label estimation, better image generation, and enhanced end-model performance through data augmentation on multiclass image classification datasets.

Many promising applications of supervised machine learning face hurdles in the acquisition of labeled data in sufficient quantity and quality, creating an expensive bottleneck. To overcome such limitations, techniques that do not depend on ground truth labels have been studied, including weak supervision and generative modeling. While these techniques would seem to be usable in concert, improving one another, how to build an interface between them is not well-understood. In this work, we propose a model fusing programmatic weak supervision and generative adversarial networks and provide theoretical justification motivating this fusion. The proposed approach captures discrete latent variables in the data alongside the weak supervision derived label estimate. Alignment of the two allows for better modeling of sample-dependent accuracies of the weak supervision sources, improving the estimate of unobserved labels. It is the first approach to enable data augmentation through weakly supervised synthetic images and pseudolabels. Additionally, its learned latent variables can be inspected qualitatively. The model outperforms baseline weak supervision label models on a number of multiclass image classification datasets, improves the quality of generated images, and further improves end-model performance through data augmentation with synthetic samples.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes