LGJun 8, 2021

Labeled Data Generation with Inexact Supervision

arXiv:2106.04716v16 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of data scarcity for machine learning practitioners by using readily available inexact labels, though it appears incremental as it builds on existing generative methods for data augmentation.

The paper tackles the challenge of generating labeled data for supervised learning by leveraging inexact supervision, such as social media tags, and proposes the ADDES framework to synthesize high-quality labeled data, with experimental results demonstrating its effectiveness in image and text datasets.

The recent advanced deep learning techniques have shown the promising results in various domains such as computer vision and natural language processing. The success of deep neural networks in supervised learning heavily relies on a large amount of labeled data. However, obtaining labeled data with target labels is often challenging due to various reasons such as cost of labeling and privacy issues, which challenges existing deep models. In spite of that, it is relatively easy to obtain data with \textit{inexact supervision}, i.e., having labels/tags related to the target task. For example, social media platforms are overwhelmed with billions of posts and images with self-customized tags, which are not the exact labels for target classification tasks but are usually related to the target labels. It is promising to leverage these tags (inexact supervision) and their relations with target classes to generate labeled data to facilitate the downstream classification tasks. However, the work on this is rather limited. Therefore, we study a novel problem of labeled data generation with inexact supervision. We propose a novel generative framework named as ADDES which can synthesize high-quality labeled data for target classification tasks by learning from data with inexact supervision and the relations between inexact supervision and target classes. Experimental results on image and text datasets demonstrate the effectiveness of the proposed ADDES for generating realistic labeled data from inexact supervision to facilitate the target classification task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes