CLAILGOct 16, 2021

HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression

arXiv:2110.08551v1661 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the challenge of model size and inference speed for NLP tasks on constrained devices, representing an incremental improvement in knowledge distillation techniques.

The paper tackles the problem of compressing large pre-trained language models for deployment on resource-limited devices by proposing a hierarchical relational knowledge distillation method, achieving superior performance on multi-domain datasets with strong few-shot learning ability.

On many natural language processing tasks, large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods. Nevertheless, their huge model size and low inference speed have hindered the deployment on resource-limited devices in practice. In this paper, we target to compress PLMs with knowledge distillation, and propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information. Specifically, to enhance the model capability and transferability, we leverage the idea of meta-learning and set up domain-relational graphs to capture the relational information across different domains. And to dynamically select the most representative prototypes for each domain, we propose a hierarchical compare-aggregate mechanism to capture hierarchical relationships. Extensive experiments on public multi-domain datasets demonstrate the superior performance of our HRKD method as well as its strong few-shot learning ability. For reproducibility, we release the code at https://github.com/cheneydon/hrkd.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes