Generative Adversarial Data Programming
This addresses the data scarcity problem for machine learning practitioners in fields like computer vision, offering a novel framework for data generation and labeling.
The paper tackles the bottleneck of limited hand-labeled training data by introducing Adversarial Data Programming (ADP), which uses weak labeling functions to generate data and curated labels, achieving applications in self-supervised image generation, zero-shot text-to-image, transfer learning, and multi-task learning.
The paucity of large curated hand-labeled training data forms a major bottleneck in the deployment of machine learning models in computer vision and other fields. Recent work (Data Programming) has shown how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time. In this work, we present Adversarial Data Programming (ADP), which presents an adversarial methodology to generate data as well as a curated aggregated label, given a set of weak labeling functions. More interestingly, such labeling functions are often easily generalizable, thus allowing our framework to be extended to different setups, including self-supervised labeled image generation, zero-shot text to labeled image generation, transfer learning, and multi-task learning.