CLLGMay 13, 2021

Distilling BERT for low complexity network training

arXiv:2105.06514v11 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for efficient NLP models on resource-constrained devices like mobiles and Raspberry Pi, though it is incremental as it applies existing distillation techniques to specific models.

The paper tackled the problem of transferring BERT's knowledge to simpler models like BiLSTM and CNNs for sentiment analysis on SST-2, showing that these distilled models achieve competitive performance while reducing inference complexity for edge devices.

This paper studies the efficiency of transferring BERT learnings to low complexity models like BiLSTM, BiLSTM with attention and shallow CNNs using sentiment analysis on SST-2 dataset. It also compares the complexity of inference of the BERT model with these lower complexity models and underlines the importance of these techniques in enabling high performance NLP models on edge devices like mobiles, tablets and MCU development boards like Raspberry Pi etc. and enabling exciting new applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes