CLSep 13, 2021

How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding

arXiv:2109.05696v23 citations
Originality Incremental advance
AI Analysis

This work addresses the need for robust and efficient model compression in NLP, though it is incremental as it builds on existing KD methods.

The paper tackled the problem of comparing and improving knowledge distillation (KD) algorithms in natural language understanding by evaluating them across in-domain, out-of-domain, and adversarial tests, and introduced Combined-KD, which achieved state-of-the-art results on the GLUE benchmark, out-of-domain generalization, and adversarial robustness.

Knowledge Distillation (KD) is a model compression algorithm that helps transfer the knowledge of a large neural network into a smaller one. Even though KD has shown promise on a wide range of Natural Language Processing (NLP) applications, little is understood about how one KD algorithm compares to another and whether these approaches can be complimentary to each other. In this work, we evaluate various KD algorithms on in-domain, out-of-domain and adversarial testing. We propose a framework to assess the adversarial robustness of multiple KD algorithms. Moreover, we introduce a new KD algorithm, Combined-KD, which takes advantage of two promising approaches (better training scheme and more efficient data augmentation). Our extensive experimental results show that Combined-KD achieves state-of-the-art results on the GLUE benchmark, out-of-domain generalization, and adversarial robustness compared to competitive methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes