CLJan 20, 2021

Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation

arXiv:2101.08106v212 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of deploying large pre-trained models in real-time applications for domains with limited training data, though it is an incremental improvement on existing knowledge distillation techniques.

The paper tackles the problem of knowledge distillation for BERT in data-scarce domains by proposing a method that learns to augment target data using resource-rich source domains, resulting in compressed student models that outperform the original teacher model with only ~13.3% of the parameters.

Despite pre-trained language models such as BERT have achieved appealing performance in a wide range of natural language processing tasks, they are computationally expensive to be deployed in real-time applications. A typical method is to adopt knowledge distillation to compress these large pre-trained models (teacher models) to small student models. However, for a target domain with scarce training data, the teacher can hardly pass useful knowledge to the student, which yields performance degradation for the student models. To tackle this problem, we propose a method to learn to augment for data-scarce domain BERT knowledge distillation, by learning a cross-domain manipulation scheme that automatically augments the target with the help of resource-rich source domains. Specifically, the proposed method generates samples acquired from a stationary distribution near the target data and adopts a reinforced selector to automatically refine the augmentation strategy according to the performance of the student. Extensive experiments demonstrate that the proposed method significantly outperforms state-of-the-art baselines on four different tasks, and for the data-scarce domains, the compressed student models even perform better than the original large teacher model, with much fewer parameters (only ${\sim}13.3\%$) when only a few labeled examples available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes