CLOct 22, 2020

Knowledge Distillation for BERT Unsupervised Domain Adaptation

arXiv:2010.11478v22.342 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses domain adaptation for sentiment classification, but it is incremental as it builds on existing adversarial and distillation methods.

The paper tackles performance degradation in BERT due to domain shifts by proposing adversarial adaptation with distillation (AAD), achieving state-of-the-art results in cross-domain sentiment classification across 30 domain pairs.

A pre-trained language model, BERT, has brought significant performance improvements across a range of natural language processing tasks. Since the model is trained on a large corpus of diverse topics, it shows robust performance for domain shift problems in which data distributions at training (source data) and testing (target data) differ while sharing similarities. Despite its great improvements compared to previous models, it still suffers from performance degradation due to domain shifts. To mitigate such problems, we propose a simple but effective unsupervised domain adaptation method, adversarial adaptation with distillation (AAD), which combines the adversarial discriminative domain adaptation (ADDA) framework with knowledge distillation. We evaluate our approach in the task of cross-domain sentiment classification on 30 domain pairs, advancing the state-of-the-art performance for unsupervised domain adaptation in text sentiment classification.

View on arXiv PDF Code

Similar