Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates
This work addresses the time-consuming annotation process in NLP for researchers and practitioners, though it is incremental as it builds on existing transfer learning and active learning methods.
The paper tackles the problem of reducing annotation costs for sequence tagging by combining deep pre-trained models with active learning, finding that using Bayesian uncertainty estimates and model distillation can achieve competitive performance with fewer labeled instances, such as reducing annotation by up to 50% in some cases.
Annotating training data for sequence tagging of texts is usually very time-consuming. Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget. We are the first to thoroughly investigate this powerful combination for the sequence tagging task. We conduct an extensive empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework and find the best combinations for different types of models. Besides, we also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance and reduces obstacles for applying deep active learning in practice.