CLAIAug 30, 2021

Want To Reduce Labeling Cost? GPT-3 Can Help

arXiv:2108.13487v1701 citations
Originality Incremental advance
AI Analysis

This addresses the problem of expensive data labeling for NLP practitioners, offering a generalizable and cost-effective methodology, though it is incremental as it builds on existing GPT-3 capabilities.

The paper tackles the high cost of data annotation in NLP by using GPT-3 as a low-cost labeler, finding it reduces labeling costs by 50% to 96% while maintaining performance on NLU and NLG tasks, and proposes a framework combining GPT-3 labels with human labels for better results with limited budgets.

Data annotation is a time-consuming and labor-intensive process for many NLP tasks. Although there exist various methods to produce pseudo data labels, they are often task-specific and require a decent amount of labeled data to start with. Recently, the immense language model GPT-3 with 175 billion parameters has achieved tremendous improvement across many few-shot learning tasks. In this paper, we explore ways to leverage GPT-3 as a low-cost data labeler to train other models. We find that, to make the downstream model achieve the same performance on a variety of NLU and NLG tasks, it costs 50% to 96% less to use labels from GPT-3 than using labels from humans. Furthermore, we propose a novel framework of combining pseudo labels from GPT-3 with human labels, which leads to even better performance with limited labeling budget. These results present a cost-effective data labeling methodology that is generalizable to many practical applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes