LG CL IR MLSep 8, 2019

Transformer to CNN: Label-scarce distillation for efficient text classification

Yew Ken Chia, Sam Witteveen, Martin Andrews

arXiv:1909.03508v111.538 citations

Originality Incremental advance

AI Analysis

This work addresses efficiency issues for deploying NLP models in resource-constrained environments, though it is incremental as it builds on existing distillation techniques.

The paper tackles the problem of high computational cost and large model size in NLP by proposing a convolutional student architecture trained via distillation from a large-scale model, achieving 300x inference speedup and 39x parameter reduction while sometimes surpassing teacher performance.

Significant advances have been made in Natural Language Processing (NLP) modelling since the beginning of 2018. The new approaches allow for accurate results, even when there is little labelled data, because these NLP models can benefit from training on both task-agnostic and task-specific unlabelled data. However, these advantages come with significant size and computational costs. This workshop paper outlines how our proposed convolutional student architecture, having been trained by a distillation process from a large-scale model, can achieve 300x inference speedup and 39x reduction in parameter count. In some cases, the student model performance surpasses its teacher on the studied tasks.

View on arXiv PDF

Similar