CLAIFeb 16, 2022

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

arXiv:2202.07922v2329 citations
AI Analysis

This provides an annotation-free and efficient method for zero-shot learning in NLP tasks like text classification, question answering, and natural language inference, though it is incremental as it builds on existing generative capabilities.

The paper tackles zero-shot learning by generating datasets from scratch using large pre-trained language models and training smaller task models on them, achieving efficient inference with models having orders of magnitude fewer parameters than PLMs.

There is a growing interest in dataset generation recently due to the superior generative capacity of large pre-trained language models (PLMs). In this paper, we study a flexible and efficient zero-short learning method, \textsc{ZeroGen}. Given a zero-shot task, we first generate a dataset from scratch using PLMs in an unsupervised manner. Then, we train a tiny task model (e.g., LSTM) under the supervision of the synthesized dataset. This approach allows highly efficient inference as the final task model only has orders of magnitude fewer parameters comparing to PLMs (e.g., GPT2-XL). Apart from being annotation-free and efficient, we argue that \textsc{ZeroGen} can also provide useful insights from the perspective of data-free model-agnostic knowledge distillation, and unreferenced text generation evaluation. Experiments and analysis on different NLP tasks, namely, text classification, question answering, and natural language inference, show the effectiveness of \textsc{ZeroGen}.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes