CLOct 22, 2020

Knowledge Distillation for BERT Unsupervised Domain Adaptation

arXiv:2010.11478v242 citations
Originality Incremental advance
AI Analysis

This addresses domain adaptation for sentiment classification, but it is incremental as it builds on existing adversarial and distillation methods.

The paper tackles performance degradation in BERT due to domain shifts by proposing adversarial adaptation with distillation (AAD), achieving state-of-the-art results in cross-domain sentiment classification across 30 domain pairs.

A pre-trained language model, BERT, has brought significant performance improvements across a range of natural language processing tasks. Since the model is trained on a large corpus of diverse topics, it shows robust performance for domain shift problems in which data distributions at training (source data) and testing (target data) differ while sharing similarities. Despite its great improvements compared to previous models, it still suffers from performance degradation due to domain shifts. To mitigate such problems, we propose a simple but effective unsupervised domain adaptation method, adversarial adaptation with distillation (AAD), which combines the adversarial discriminative domain adaptation (ADDA) framework with knowledge distillation. We evaluate our approach in the task of cross-domain sentiment classification on 30 domain pairs, advancing the state-of-the-art performance for unsupervised domain adaptation in text sentiment classification.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes