CLDec 16, 2021

AcTune: Uncertainty-aware Active Self-Training for Semi-Supervised Active Learning with Pretrained Language Models

arXiv:2112.08787v22 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the need for reduced labeled data in NLP tasks, offering a domain-specific solution that is incremental but provides strong gains.

The paper tackles the problem of label efficiency in fine-tuning pre-trained language models by proposing AcTune, a framework that combines active learning and self-training using uncertainty to select samples for annotation or pseudo-labeling, achieving a 56.2% average improvement in label efficiency across six text classification datasets.

While pre-trained language model (PLM) fine-tuning has achieved strong performance in many NLP tasks, the fine-tuning stage can be still demanding in labeled data. Recent works have resorted to active fine-tuning to improve the label efficiency of PLM fine-tuning, but none of them investigate the potential of unlabeled data. We propose {\ours}, a new framework that leverages unlabeled data to improve the label efficiency of active PLM fine-tuning. AcTune switches between data annotation and model self-training based on uncertainty: it selects high-uncertainty unlabeled samples for active annotation and low-uncertainty ones for model self-training. Under this framework, we design (1) a region-aware sampling strategy that reduces redundancy when actively querying for annotations and (2) a momentum-based memory bank that dynamically aggregates the model's pseudo labels to suppress label noise in self-training. Experiments on 6 text classification datasets show that AcTune outperforms the strongest active learning and self-training baselines and improves the label efficiency of PLM fine-tuning by 56.2\% on average. Our implementation will be available at \url{https://github.com/yueyu1030/actune}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes