LGCLIRMLSep 8, 2019

Transformer to CNN: Label-scarce distillation for efficient text classification

arXiv:1909.03508v138 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency issues for deploying NLP models in resource-constrained environments, though it is incremental as it builds on existing distillation techniques.

The paper tackles the problem of high computational cost and large model size in NLP by proposing a convolutional student architecture trained via distillation from a large-scale model, achieving 300x inference speedup and 39x parameter reduction while sometimes surpassing teacher performance.

Significant advances have been made in Natural Language Processing (NLP) modelling since the beginning of 2018. The new approaches allow for accurate results, even when there is little labelled data, because these NLP models can benefit from training on both task-agnostic and task-specific unlabelled data. However, these advantages come with significant size and computational costs. This workshop paper outlines how our proposed convolutional student architecture, having been trained by a distillation process from a large-scale model, can achieve 300x inference speedup and 39x reduction in parameter count. In some cases, the student model performance surpasses its teacher on the studied tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes